Battery Storage Safety in 2026: Why Cell-Level Monitoring Is No Longer Enough

Battery Storage Safety in 2026: Why Cell-Level Monitoring Is No Longer Enough

 

The scale of battery energy storage deployment has changed faster than the safety monitoring frameworks that govern it. In 2026, utility-scale BESS projects operate at capacities measured in hundreds of megawatt-hours, concentrated on single sites in containerized configurations where a failure in one unit creates risk for adjacent units. The assets are larger, the energy densities are higher, and the consequences of a safety event are proportionally more severe. Against that backdrop, the industry's dominant monitoring approach — a battery management system watching cell voltage, current, and temperature — was designed for a different scale of risk.

This is not a criticism of BMS technology. A well-designed BMS does its job: it monitors the electrochemical state of the cell stack, enforces safe operating limits, manages balancing, and provides the data that state estimation algorithms need to track capacity fade and health over time. The problem is not that the BMS performs its function poorly. The problem is that its function covers only a fraction of the failure pathways that actually cause BESS safety events at scale.

EPRI's analysis of its BESS Failure Incident Database — the most comprehensive public record of documented stationary storage failures — found that only 11 percent of failures trace to cell defects. The rest involve balance-of-system components, cooling infrastructure, environmental conditions, and integration quality. The monitoring architecture deployed on the majority of BESS sites today covers the 11 percent extensively and the remaining failure pathways not at all. In 2026, that gap has direct regulatory, insurance, and operational consequences.

What the Failure Record Actually Shows

The incident record from the past decade of utility-scale BESS deployment tells a consistent story that differs substantially from the popular narrative about battery fires. Public coverage of BESS incidents almost universally attributes fires to thermal runaway and, by implication, to faulty cells. Post-incident investigations, when they are thorough enough to identify root causes, frequently tell a different story.

The Korean ESS incident cluster from 2017 to 2019 — the most extensively investigated large-scale BESS safety event in the historical record — implicated multiple contributing factors: improper system integration, inadequate fire suppression, absence of grid protection equipment, and environmental control failures. Korean investigations found condensation combined with dust contamination caused insulation breakdown in numerous cases. The root cause was environmental and integration-related, not cell defect.

The Moss Landing fire in California involved water ingress as a contributing factor. The Victorian Big Battery incident in Australia in 2021 involved a compressor failure that caused cooling loss, which then created localized thermal stress in a battery rack. The Chandler, Arizona fire in 2022 was traced to a short circuit in the power conversion system, not the battery cells. Clean Energy Associates' 2024 BESS Quality Report found that 26 percent of inspected BESS units had defects in fire suppression systems and 18 percent had thermal management system defects — balance-of-system issues detectable before a fire event if the right monitoring were in place.

These are not rare exceptions. They represent the dominant failure mode. A monitoring architecture that watches cells and assumes the balance-of-system is working correctly is designed to detect the minority of failures and miss the majority.

The Three Failure Pathways That Cell Monitoring Cannot Reach

Understanding why cell-level monitoring is insufficient in 2026 requires specifying which failure pathways fall outside its coverage. Three categories dominate the incident record and are systematically invisible to BMS monitoring.

The first is thermal management failure. HVAC degradation in a BESS container is slow and cumulative. A cooling system delivering optimal performance at commissioning typically shows measurable efficiency decline within the first two to three years through refrigerant loss, compressor bearing wear, and heat exchanger fouling. The coefficient of performance drops incrementally. The batteries run progressively hotter on every cycle. The BMS sees slightly elevated cell temperatures and may flag no alarm, because the temperature remains within the configured limit even as it trends toward that limit over months. The BMS cannot distinguish between a container running at the upper end of its thermal envelope because of high ambient temperature and one running there because the HVAC is delivering 40 percent less cooling than it was at commissioning. The symptom looks identical in cell temperature data. The cause and the rate of acceleration are completely different.

On a €50 million battery asset, sustained operation 10°C above optimal roughly doubles the aging rate of lithium-ion cells. A cooling system that has degraded to half its commissioning efficiency without detection is not producing a safety event in the immediate term — it is silently consuming years of asset life and moving the system progressively toward the conditions where a stress event can trigger a cascade. The BMS reports everything within limits until those limits are breached, at which point the accumulated thermal stress is already embedded in degraded cell chemistry.

The second is environmental failure. Container humidity and condensation are not monitored by any standard BESS component. HVAC cycling creates repeated humidity spikes inside containers — periods where relative humidity exceeds 75 percent — that occur dozens of times per day and are never recorded in any monitoring data stream. These spikes deposit moisture on electrical components, busbars, and connection points. Over time, combined with the conductive dust that accumulates in industrial environments, this creates the conditions for insulation breakdown and electrical faults. DNV confirmed this mechanism as a direct fire cause in documented BESS incidents. The BMS reports nothing throughout this process, because the failure is happening in the container environment, not in the cell stack.

The third is off-gas generation preceding thermal runaway. When cells approach runaway, they emit detectable volatile organic compounds, hydrogen, and CO before the thermal event becomes self-sustaining. This off-gas window — typically 5 to 20 minutes before runaway — is the most valuable safety interval in the entire BESS operating environment. Most BESS sites have gas sensors installed, required under NFPA 855. What most sites do not have is analytics capable of distinguishing genuine pre-runaway signatures from the false alarms generated by HVAC cycling, maintenance activities, and sensor drift. The result is alarm fatigue: operators receive gas alarms regularly in normal operation and have learned to discount them. When a genuine pre-runaway signature appears, it arrives in the same format — a threshold crossing — as the dozens of false alarms that preceded it in that week.

These three failure pathways account for the overwhelming majority of documented BESS safety events. None of them are visible in BMS data.

Regulatory and Insurance Pressure in 2026

The external pressure on BESS operators to expand safety monitoring beyond cell-level data is building from two directions simultaneously in 2026: regulatory development and insurance market evolution.

On the regulatory side, NFPA 855 continues to evolve as the primary standard governing stationary energy storage safety in North American markets. The standard already requires gas detection at BESS installations, but the gap between detection hardware installed and effective detection analytics implemented is large and increasingly visible to regulators and fire code authorities. Local jurisdictions in California, New York, and several other states have moved to require documented hazard mitigation plans that go beyond equipment installation to address operational monitoring and response protocols. The direction is toward demonstrated monitoring capability, not just hardware compliance.

The IEEE published its recommended practice for BMS in stationary energy storage applications in February 2025 under 2686-2024, which formally positions the BMS as a functionally distinct component of the BESS — explicitly one component among others, not the monitoring system for the entire asset. This framing creates regulatory space for authorities and standards bodies to specify what the complementary monitoring components are required to cover, a conversation that is already underway in technical working groups.

Insurance markets have moved faster than regulations. BESS insurers, responding to a claims history that includes several high-profile and expensive incidents, have substantially tightened underwriting requirements for utility-scale projects. The requirements increasingly include documented environmental monitoring, HVAC inspection programs, and evidence of monitoring beyond threshold alarms on gas detection systems. Some insurers now treat the presence or absence of balance-of-system monitoring as an underwriting factor that affects both coverage availability and premium. For project developers and asset owners, this creates a direct financial incentive to expand monitoring coverage — the monitoring investment is offset by reduced insurance costs and improved coverage terms.

The combination of regulatory direction and insurance market pressure is moving the industry toward a monitoring standard in 2026 that looks materially different from what was acceptable in 2022. Cell-level BMS monitoring, supplemented by binary threshold alarms on gas detectors and periodic visual inspections, was the industry norm through most of the 2020s. That norm is shifting under the weight of incident data, regulatory evolution, and insurance market signals.

What Comprehensive Safety Monitoring Now Requires

The gap between cell-level BMS monitoring and comprehensive BESS safety monitoring is not a gap that more sophisticated BMS algorithms can close. It is a gap in sensor coverage and monitoring scope that requires additional physical monitoring infrastructure deployed for the systems the BMS does not reach.

The table below maps the documented failure pathways to the monitoring capabilities that address them:

Failure pathway

BMS visibility

Required monitoring addition

HVAC efficiency degradation

None — sees consequence only

Compressor current signature, COP trending, vibration

Refrigerant loss and cooling decline

None

HVAC power consumption vs cooling output ratio

Container humidity spikes

None

Multi-point humidity and dew point sensors

Condensation on electrical components

None

Surface temperature + humidity condensation probability

Pre-runaway off-gas generation

Indirect — no gas sensors

Multi-gas detection with pattern analytics

Spatial thermal hotspot formation

Partial — point sensors only

Spatial temperature arrays across racks

Water ingress and seal degradation

None

Environmental monitoring with anomaly baseline

Fire suppression system readiness

None

Independent system health monitoring

The monitoring additions in this table share a practical characteristic: none of them require BMS integration or SCADA modification to implement. They require dedicated sensors for the physical phenomena they measure and a processing layer capable of running continuous anomaly detection against site-specific baselines. The BMS continues to perform its electrochemical monitoring function. The additional monitoring layer adds coverage for the systems and failure pathways outside BMS scope.

The processing architecture for this additional monitoring layer matters for the same reasons covered in regulatory and operational discussions: decisions that have a sub-second response requirement and that must remain operational when connectivity is interrupted cannot depend on cloud-based inference. Pattern-based off-gas classification, spatial thermal hotspot detection, and condensation probability scoring need to run at the asset, on dedicated hardware, continuously and independently of external network conditions.

 

bess monitoring

 

The First Two Years Problem

The failure timeline for BESS deployments adds a specific urgency to the monitoring gap discussion. EPRI's analysis of its incident database found that 72 percent of BESS failures occur within the first two years of operation. This is the period when integration quality issues, commissioning errors, and component defects that passed factory testing emerge under real operating conditions. It is also the period when monitoring is often most sparse — the site is newly commissioned, the O&M program is still being established, and the monitoring architecture reflects what was specified at contract rather than what the operational experience of running the site has revealed.

The implications are direct. A new BESS site operating without HVAC health monitoring, environmental monitoring, and off-gas analytics in its first two years of operation is in the highest-risk window with the most incomplete monitoring coverage. The pattern of failures in the first two years is not driven primarily by cell aging — cells are near the beginning of their cycle life. It is driven by the integration and environmental factors that BMS monitoring does not see.

This observation should change how monitoring requirements are specified in project development. The standard industry approach treats comprehensive monitoring as something to add after commissioning, once the site is generating revenue. The failure data suggests the opposite priority: comprehensive monitoring from day one, when the risk is highest and the value of early detection is greatest.

Practical Path for Operators in 2026

For operators of existing BESS fleets, the practical question in 2026 is not whether to expand monitoring beyond the BMS — the failure data, regulatory direction, and insurance market pressure all point in the same direction — but how to do it efficiently across a fleet that may include multiple OEMs, multiple vintages of equipment, and sites with varying connectivity and accessibility.

Several practical considerations shape the deployment approach:

OEM independence is essential for fleet-scale deployment. A monitoring solution that requires BMS integration for each OEM in the fleet creates a different integration engineering program for each equipment type. Monitoring that deploys dedicated sensors alongside the existing equipment without touching the BMS or SCADA can use the same hardware and software configuration across the entire fleet regardless of OEM, dramatically reducing the implementation overhead for multi-site rollout.

Pilot-then-scale is the appropriate deployment model. Deploying comprehensive monitoring at a single asset — one that is representative of the fleet in terms of equipment configuration and site conditions — allows the monitoring system to establish site-specific baselines, validate alert thresholds against actual operating conditions, and demonstrate value to O&M teams before fleet-wide commitment. A 90-day pilot window is sufficient to establish meaningful baselines and produce the first actionable insights from HVAC health and environmental monitoring.

Prioritize by failure risk profile. Sites in the first two years of operation should receive monitoring priority based on the EPRI failure timeline data. Sites with documented HVAC issues, humidity problems, or previous alarm events should receive priority based on operational history. Sites carrying large insurance deductibles or in jurisdictions with active regulatory scrutiny should receive priority based on financial and compliance exposure.

The monitoring gap between what BMS data provides and what comprehensive BESS safety requires is well-documented, technically addressable with available technology, and increasingly the subject of regulatory and insurance market attention. The case for expanding monitoring coverage in 2026 rests not on theoretical risk but on the incident record of what has actually failed, and on the growing external pressure from standards bodies, regulators, and insurers who have read that record.

Quick Overview

Battery storage safety in 2026 requires monitoring coverage that extends substantially beyond cell-level BMS data. The documented failure record — anchored in EPRI's BESS Failure Incident Database — shows that the majority of BESS safety incidents originate in balance-of-system components, cooling infrastructure, and environmental conditions that standard BMS architecture does not monitor. Regulatory development under NFPA 855, insurance market tightening, and the EPRI finding that 72 percent of failures occur within the first two years of operation are converging to create strong pressure on operators to expand monitoring coverage to HVAC health, container environment, and off-gas analytics as standard elements of BESS safety infrastructure.

Key Applications

Utility-scale and C&I BESS assets in the first two years of operation where failure rates are highest, existing fleets where insurance underwriting requirements now include demonstrated balance-of-system monitoring, assets subject to NFPA 855 compliance and local jurisdiction requirements for hazard mitigation documentation, multi-vendor BESS fleets where OEM-agnostic monitoring is needed without BMS integration, and project developers specifying monitoring requirements for new assets where comprehensive coverage from commissioning is more efficient than retroactive addition.

Benefits

Environmental and HVAC monitoring closes the documented failure pathways that cell-level BMS monitoring cannot reach. Pattern-based off-gas analytics converts the 5-to-20-minute pre-runaway window from a theoretically available warning into an operationally actionable alert by reducing false positives. Continuous COP trending detects HVAC degradation months before it affects cell temperatures or produces BMS alarms. OEM-agnostic deployment allows fleet-scale rollout without per-site integration engineering for each OEM's proprietary BMS interface.

Challenges

Deploying physical sensor infrastructure across existing BESS sites requires access coordination and installation effort that is absent from monitoring solutions that rely only on BMS data streams. Establishing meaningful baselines for adaptive anomaly detection requires a commissioning period at each site. Integrating monitoring alerts into existing O&M procedures requires organizational change alongside the technical deployment. The regulatory framework specifying what complementary monitoring is required beyond BMS is still developing, creating uncertainty about which capabilities will become mandatory thresholds.

Outlook

The trajectory is toward formalized monitoring requirements that go beyond the BMS as a standalone safety layer. IEEE 2686-2024 positions the BMS as one component of the BESS system, creating regulatory space for standards bodies to specify complementary monitoring requirements. NFPA 855 evolution, state-level energy storage safety regulations in major markets, and insurance underwriting requirements are all moving toward more comprehensive monitoring specifications. Edge AI platforms that deploy without OT network modification and operate independently of cloud connectivity are the most practical near-term architecture for closing the monitoring gap across existing and new BESS deployments at scale.

Related Terms

BESS safety, BMS limitations, thermal runaway, EPRI failure database, NFPA 855, HVAC monitoring, COP tracking, condensation risk, off-gas detection, multi-gas correlation, cell-level monitoring, balance of system, IEEE 2686-2024, environmental monitoring, insulation breakdown, humidity monitoring, Korean ESS fires, Moss Landing, edge AI, OEM-agnostic monitoring, first two years failure rate, battery storage fire prevention

 

Contact us

 

 

Our Case Studies

 

FAQ

Why does BESS monitoring in 2026 need to go beyond cell-level BMS data?

 

EPRI's analysis of documented BESS failures found that only 11 percent of incidents trace to cell defects. The majority involve cooling system failures, environmental conditions such as condensation and humidity, fire suppression defects, and integration quality issues. A BMS monitors cell voltage, current, and temperature — parameters that do not reflect the condition of HVAC systems, container environments, or off-gas generation that precedes thermal runaway. Monitoring only cell-level data means monitoring the minority of documented failure pathways while the majority remain invisible.
 

What percentage of BESS failures occur in the first two years of operation?

 

EPRI's analysis found that approximately 72 percent of documented BESS failures occur within the first two years of operation. This period is characterized by integration quality issues, commissioning errors, and component defects that emerge under real operating conditions rather than factory test conditions. Cell aging is not the primary driver in this period — integration and environmental factors dominate, which reinforces the need for environmental and balance-of-system monitoring from commissioning rather than as an addition after the site matures.
 

How do insurance market requirements for BESS safety monitoring differ in 2026 from 2022?

 

Insurance underwriting for utility-scale BESS has tightened substantially in response to the claims history from high-profile incidents. Requirements in 2026 increasingly include documented environmental monitoring, HVAC inspection programs, and demonstrated analytics capability beyond binary threshold alarms on gas detection systems. Some insurers treat the presence or absence of balance-of-system monitoring as an underwriting factor affecting premium pricing and coverage availability. The financial incentive to expand monitoring coverage has grown as the cost differential between insured and uninsured risk increases.
 

What does comprehensive BESS safety monitoring require beyond the BMS?

 

Comprehensive BESS safety monitoring requires dedicated monitoring for three systems the BMS does not cover: HVAC health — through compressor current signature analysis, vibration trending, and continuous COP tracking normalized against ambient conditions; container environment — through multi-point humidity, dew point, and condensation probability monitoring; and off-gas detection with pattern analytics — through multi-gas correlation across VOC, hydrogen, CO, and CO₂ that distinguishes genuine pre-runaway signatures from environmental false positives. None of these monitoring additions require BMS integration or SCADA modification.