The Medium-Severity Paradox: Where Engineering Time Actually Goes

The average critical-severity incident in the Revelara corpus takes about 19 hours to resolve. The average medium-severity incident takes about 17 hours. On those two numbers alone, the natural conclusion is that critical incidents are roughly where engineering time goes, and that medium-severity work is the manageable middle. The conclusion is wrong by roughly an order of magnitude, and the data shows why.

The numbers, and how the analysis was run

This analysis pulls from the Revelara public postmortem corpus, which currently spans nearly 3,000 published incidents and grows daily. Two severity-targeted comparisons were run. The “critical” set returned 272 incidents matching body language like SEV1, P0, total outage, and major. The “medium” set returned 284 incidents matching SEV3, P2, degraded, and partial.

Neither set is a pure severity filter. The queries hit text content rather than tags, so the populations are mixed. The critical-leaning set is 62% P0 or P1 by tag. The medium-leaning set is 60% P2 by tag. That mix is part of why the comparison is directional rather than tag-precise. The headline findings below are robust to that imprecision; the absolute hour totals later in the post are order-of-magnitude estimates rather than exact.

Per-incident, critical takes longer

At the typical end of the distribution, critical incidents resolve more slowly than medium incidents.

	Critical-leaning set	Medium-leaning set
Median MTTR	151 minutes	121 minutes
Average MTTR	1,166 minutes (19.4 hours)	1,017 minutes (17 hours)

This matches what operational engineers would predict. Critical incidents are more complex, often touch more systems, and consume more cognitive load even when they get priority response. The median engineer reading this would not be surprised by either of those numbers.

The tail flips

The 95th-percentile numbers are where the first surprise shows up:

	Critical-leaning set	Medium-leaning set
p95 MTTR	2,811 minutes (46.9 hours)	4,468 minutes (74.5 hours)

The worst medium-severity incidents in the corpus run about 59% longer than the worst critical-severity ones. The shape under that finding is operational: critical incidents are well-managed at the tail because the response structure is built for the worst case. Someone owns it. The war room stays open. Resolution gets enforced. Medium-severity incidents do not get that treatment. When they linger, they keep lingering, and they can run for a week or more because no organizational mechanism is forcing them toward closure.

Volume swamps the per-incident difference

The bigger finding is volume. The full corpus contains 1,772 P2/medium incidents and 114 P0/critical incidents. Medium is 15.5x more common than critical.

Applying the average MTTR from each sample to the corpus-wide severity counts produces order-of-magnitude estimates of aggregate engineering time by severity:

Critical (P0): 114 incidents × ~19.4 hours ≈ 2,200 engineering-hours (about 92 person-days)
Medium (P2): 1,772 incidents × ~17 hours ≈ 30,000 engineering-hours (about 3.4 person-years)

Aggregate engineering time on medium-severity work is approximately 13x the time spent on critical. The per-incident MTTR difference (critical 15% longer) is real, and it is irrelevant. Volume swamps it.

The order-of-magnitude framing matters here. The exact ratio shifts depending on which sample you draw from and how you handle classification noise (more on that in the caveats). The ratio is solidly above 10x in any reasonable handling of the data. It is not 1.5x. It is not 3x. The aggregate engineering capacity that public postmortems describe being spent on medium-severity incidents is more than an order of magnitude greater than what is spent on critical.

What this means

First, the operational story most engineers carry around about severity is wrong on the part that matters. Critical incidents feel expensive because they are visible, they get all-hands attention, and they end up in the postmortems people share at conferences. Medium incidents are the invisible expense. Most resolve quickly. A meaningful minority do not. In aggregate, medium-severity work consumes an order of magnitude more engineering capacity than critical. Reliability budget is mostly going to medium-severity work, and most teams do not see it because each individual incident looks unremarkable.

Second, the highest-payoff reliability investment is not where the standard playbook points. The default playbook concentrates on critical incidents: better runbooks, faster paging, tighter blast-radius controls, more rigorous postmortems. All correct work. None of it touches the part of the system where most engineering time gets paid. The return on improving critical-incident response is capped at roughly the 2,200 engineering-hours estimated above, and that is before you count the difficulty of further compressing already well-managed processes. The return on preventing medium-severity incidents before they reach production is in the neighborhood of 30,000 engineering-hours of latent capacity, against a prevention discipline that is far less mature.

The shift-left version of this

The data lands on a specific kind of reliability work. Catching the patterns that produce medium-severity incidents before they reach production is where reliability investment compounds.

This is the unglamorous half of reliability practice. Medium-severity prevention does not show up in conference talks. It does not produce hero engineers. It looks like architectural review of design changes, pre-production pattern matching against the catalog of known medium-severity failure modes, and the discipline of taking medium-severity signals as seriously during design review as critical signals get during incidents.

The teams that do this work look ordinary, because the incidents they prevent are not the kind that get noticed. A medium-severity Wednesday-morning degradation that did not happen does not write a postmortem about itself. The aggregate effect compounds quietly. Over a year, a team that has built a real medium-severity prevention discipline reclaims engineering capacity at a scale that critical-incident-response improvements cannot match.

Caveats

Severity classification varies wildly between publishers. AWS classified the May 7-8 us-east-1 thermal event affecting nine of its own services as a medium-severity event in its status reporting. Other publishers classify shorter, narrower incidents as critical. The severity tags in the corpus reflect the publisher’s choice, not a normalized standard. The findings above are directional with respect to that variability. If anything, classification drift probably understates the medium-severity story, because some events tagged “medium” by their publisher would be tagged “critical” by another team.

Public-postmortem survivorship bias is real. Companies that publish detailed postmortems are not a representative sample of all teams running production software. The corpus skews toward larger, more mature, more transparent organizations. The pattern may differ in a corpus of internal incident data, especially at smaller teams where severity classification is less rigorous.

The query-based grouping returns mixed-severity sets rather than tag-pure filters. The critical query group is 62% P0 or P1 by tag; the medium group is 60% P2. The aggregate-time estimates extrapolate from sample averages and should be treated as order-of-magnitude rather than precise. The ratio holds up under sensitivity analysis; the absolute numbers should not be quoted to the nearest hundred hours.

What to do with this

If you are a reliability engineer, a platform lead, or an engineering manager allocating reliability budget, three concrete actions:

Run the same analysis on your own incident history. Most teams have severity tags in their incident management tool. Compute aggregate engineering time by severity and look at the ratio. If you are seeing a similar shape to the public corpus, your reliability investment priorities probably need adjustment.
Build a medium-severity prevention discipline. Pre-production review of design changes, configuration changes, and code changes against the known medium-severity failure patterns is where the return lives. The corpus of public postmortems contains thousands of named patterns. Most are well-described. Most are not in any team’s review checklist.
Treat the medium-severity long tail as a real risk. Most medium incidents resolve fast and are not the budget sink. A few run for days. The few are where reliability capacity disappears. Monitoring, runbook investment, and ownership structures should prioritize incident classes that show long-tail behavior, not just incident classes that get high severity tags.

We are building Revelara to make pre-production identification and remediation against the public corpus of failure modes cheap and routine. The corpus is the answer to “has anyone else hit this pattern, and if so, how did it shape up in production.” Medium-severity prevention is work it is built to support.

Closing

The reliability budget at most engineering organizations is not spent on critical incidents. It is spent on medium-severity work that nobody is putting on a slide. The data on that is unambiguous. The implication is also unambiguous: the engineering hours that prevent medium-severity incidents before they ship are where reliability work pays back.