Search or navigate to a page
On 5 April 2010, an explosion tore through the Upper Big Branch coal mine in West Virginia, killing 29 miners in the worst U.S. mining disaster in 40 years. In the years leading up to the tragedy, Massey Energy had publicly cited its declining Total Recordable Incident Rate (TRIR) as evidence of operational excellence. The Mine Safety and Health Administration's subsequent investigation revealed a different reality: ventilation defects, methane monitoring failures, and a culture that routed around controls. The metrics that leadership trusted were not measuring what mattered. They were measuring what had already happened—and even then, imperfectly.
Upper Big Branch is one of the most widely cited illustrations of a problem that persists across industries: an over-reliance on lagging indicators to evaluate the performance of a safety system. For HSE professionals, the question is not whether TRIR, Lost Time Injury Frequency Rate (LTIFR), or Days Away Restricted Transferred (DART) rates have value. They do. The question is whether the broader KPI framework that surrounds them actually tells leadership—before an event, not after—whether the organisation's controls are working. This article examines how to build a predictive HSE KPI framework that moves beyond the TRIR ritual, aligns with ISO 45001 Clause 9 (Performance evaluation), and produces the kind of data that can prevent the next Upper Big Branch.
Lagging indicators measure outcomes after the fact. TRIR, LTIFR, DART, fatality counts, property damage costs, and regulatory fines are all lagging. They are essential: regulators require them, insurers price on them, and they provide a historical benchmark. But as the Center for Chemical Process Safety (CCPS) has argued since its 2010 publication Process Safety Leading and Lagging Metrics, lagging indicators are fundamentally retrospective—they tell you the controls failed, but not which ones, when the weakening began, or how close the next failure may be.
Leading indicators measure the health of the system that produces safe outcomes. They are forward-looking proxies for risk exposure. A leading indicator answers the question, "Given what I am seeing in this data, what is the probability of a serious incident in the next quarter?" Examples include the percentage of overdue corrective actions, the ratio of observed to expected hazard reports, permit-to-work audit pass rates, or the time-to-close on safety-critical maintenance work orders.
The two categories are not in competition. Robert Long's often-quoted framing—that lagging indicators are like looking in the rear-view mirror while leading indicators are like looking through the windshield—is useful but incomplete. Mature HSE systems triangulate both, and recognise a third category often overlooked in practice: current-state indicators, which describe the real-time condition of critical controls (for example, the percentage of safety-critical equipment currently in its inspection window, or the number of active permits exceeding standard duration). Together, these three categories form the measurement layer of any credible Safety Management System.
Research from the Energy Institute, the HSE (UK) HSG254, and ANSI/ASSP Z16.1-2022 converges on a consistent set of criteria. A leading indicator is worth tracking only if it meets five tests.
It is predictive of outcome. There must be a defensible, preferably empirical, linkage between movement in the indicator and the probability of harm. "Number of toolbox talks held" fails this test unless the organisation can demonstrate that talk quality and frequency correlate with reduced rule-violation rates. "Percentage of high-risk tasks observed with a behaviour-based safety observation in the past 30 days" passes, because observation frequency in safety-critical work has well-documented links to at-risk behaviour reduction.
It is actionable at a defined level of the organisation. Every indicator must have an owner who can respond when it moves. A corporate-level metric with no front-line handle is decoration. ISO 45001 Clause 9.1 requires that monitoring measure both the extent to which legal requirements are fulfilled and the effectiveness of controls; both conditions imply responsibility for action.
It is resistant to gaming. If a leading indicator becomes a management target tied to bonuses, Goodhart's Law takes over: "When a measure becomes a target, it ceases to be a good measure." Indicators most vulnerable to manipulation are those with soft definitions (observations, audits, near-miss counts). The mitigation is dual reporting—numerator and denominator both disclosed—and periodic audit of source data.
It has a clear operational definition. Two supervisors in the same plant should record the same event identically. "Near miss" alone is inadequate; "unplanned event with potential for serious injury, reported within 24 hours, classified using the corporate severity matrix" is operational.
It produces signal, not noise. An indicator that changes every reporting cycle in a random walk is consuming attention without producing insight. Statistical process control thinking—common-cause versus special-cause variation—should be applied before an indicator enters a leadership dashboard.
A practical KPI framework covers five domains. Each domain carries two to three indicators; fewer produces blind spots, more produces dilution. Leading organisations typically run ten to fifteen HSE KPIs at the executive level.
Domain 1: Risk Identification and Assessment. Sample indicators include the percentage of high-risk activities with a current, task-level risk assessment (target ≥ 95%); the ratio of hazards reported per 100,000 work hours (benchmark: the Energy Institute suggests a minimum of 10 hazard reports per recordable incident in mature cultures); and the percentage of risk assessments reviewed within their scheduled periodicity.
Domain 2: Control Integrity. These indicators describe whether the controls on which the risk assessment relies are actually present and functional. Examples: percentage of safety-critical equipment within its inspection window; permit-to-work audit compliance rate; percentage of overdue corrective actions from audits, incidents, and management of change reviews. This domain aligns directly with the "bow-tie" approach used in process safety and is often the most diagnostic when an investigation reconstructs the timeline of a serious incident.
Domain 3: Competence and Engagement. Percentage of workforce current on mandatory HSE training; percentage of supervisors who have completed a documented safety leadership module in the past 24 months; participation rate in voluntary hazard reporting and stop-work authority events. ANSI/ASSP Z10-2019 explicitly ties training currency to management system maturity.
Domain 4: Incident Learning. Time from incident occurrence to investigation closure (benchmark: 30 days for medical-treatment cases, 60 days for serious injuries with corrective actions verified); percentage of high-potential events (HiPos) with causal analysis completed using a recognised methodology such as ICAM, TapRooT, or Apollo; repeat-causal-factor rate across investigations.
Domain 5: Outcome. The traditional lagging set—TRIR, LTIFR, Serious Injury and Fatality (SIF) rate, environmental release count, regulatory actions—should remain in the scorecard, but balanced so they represent no more than a third of total indicators. The SIF metric specifically has gained traction since the 2011 BST/Campbell Institute research showing that low TRIR does not correlate with reduced fatality risk; organisations should track SIF precursors separately and not allow overall recordable rates to mask exposure to life-altering events.
The most common failure in HSE KPI programmes is not selecting the wrong indicators—it is failing to close the loop. Indicators are reported in monthly operational reviews, briefly discussed, and then stranded without action. A functioning KPI framework requires three disciplines.
First, variance response must be automatic. When an indicator moves outside its control limits, a documented escalation is triggered: who is notified, within what timeframe, with what investigation expected. Without this, indicators become scenery.
Second, do not over-measure. Sites that track 50 HSE indicators typically manage none of them effectively. The signal-to-noise ratio collapses, ownership diffuses, and leadership disengages. Tier the framework: 10–15 at the executive level, a broader set at operational level, with clear traceability between tiers.
Third, validate the causal chain periodically. A leading indicator is a hypothesis about cause and effect. Every 18 to 24 months, test that hypothesis. Do sites with better hazard-reporting ratios actually experience fewer HiPos? If the correlation has decayed, the indicator has become ceremonial and should be retired or redefined.
Finally, be alert to the TRIR mirage. Multiple studies, including the CSB's review of process safety incidents in U.S. refining, have demonstrated that recordable injury rate trends can improve during the same period in which catastrophic-event precursors are worsening. If the scorecard is top-heavy with personal safety outcomes and light on process safety or control-integrity indicators, leadership will be told a story that reality will eventually contradict—often at enormous cost.
Start with a single diagnostic question: If every lagging indicator on my current scorecard went to zero next quarter, would I be confident that my organisation is safer—or simply luckier? If the answer is not an immediate yes, the scorecard is incomplete.
Build the framework from risk downward, not from available data upward. Identify the top five to seven Material Unwanted Events (MUEs) for the operation. For each, identify the critical controls. Design leading indicators that measure the health of those controls. This approach, recommended by the International Association of Oil and Gas Producers (IOGP) Report 456, keeps the KPI framework anchored to what can actually kill or seriously harm workers.
Integrate with the Plan-Do-Check-Act cycle at the heart of ISO 45001. Indicators live in the "Check" phase but are only useful if they feed explicit "Act" decisions. Corporate management reviews (Clause 9.3) should explicitly agree which indicators triggered action, what actions were taken, and what the expected shift in performance will be.
Finally, treat the KPI framework as a living instrument. Operations change, risks evolve, and indicators that were predictive in one context can become obsolete in another. A mature HSE function reviews its measurement set annually and has the professional courage to retire metrics that no longer earn their place on the dashboard.
The post-Upper Big Branch era of HSE measurement demands more than compliance-driven reporting. A predictive KPI framework—grounded in leading indicators, balanced across risk identification, control integrity, competence, learning, and outcomes, and tightly coupled to escalation and action—turns measurement from a backward-looking audit trail into a forward-looking management instrument. The organisations that do this well are not the ones with the most sophisticated dashboards. They are the ones in which every indicator on the scorecard answers a clear question, is owned by a clear actor, and triggers a clear response when it moves. In an industry where the cost of being wrong is measured in human lives, that discipline is the difference between a safety management system that performs and one that merely reports.
Sign in to join the conversation