DORA shipped The ROI of AI-Assisted Software Development recently. There’s an interesting number from that report that ties back to a Google Cloud 2025 report on the ROI of AI; 78% of executives from organizations with C-level AI sponsorship report seeing ROI now on at least one generative AI use case. The new report shares a number I have not seen quoted before, a 15% productivity drop used as the default in its sample ROI calculator. The report is explicit that the actual depth and duration of the dip are unpredictable; 15% is a placeholder input, not a measurement. On a 500-engineer organization at $176,000 fully loaded salary, that is $3.3 million in lost capacity over three months. The report calls this the “tuition cost” of AI adoption. It includes the line item in its example budget. Then it moves on.

But let that sink in for a minute. The 2026 report is, on the whole, optimistic. It frames AI as “an amplifier”: magnifying the strengths of high-performing organizations and the dysfunctions of struggling ones. It cites the 2025 DORA State of AI-Assisted Software Development finding that more than 80% of survey respondents perceived AI as increasing their productivity. It describes the J-curve as “a stepping stone rather than a setback.” All of this is fair. None of it is what made me read the report twice.

What made me read it twice was the specific shape of the curve, and what the report does and does not prescribe to flatten it.

The J-curve, briefly

The J-curve is the report’s central conceptual model. It claims that AI adoption produces a temporary dip in delivery performance before eventual exponential growth, and that the dip is composed of three identifiable costs.

The learning curve. Time pulled away from feature delivery to learn new tools, prompting techniques, agentic workflows, and the rest. This is the boring part and it is real.

The verification tax. The report defines this as “the cognitive load required to iterate on prompts and rigorously audit AI-generated code that looks remarkably similar to correct code.” Two halves matter equally. The volume of code produced expands, increasing the absolute review burden. And the quality of the code is uneven in a way that resists pattern recognition: it looks correct.

Pipeline adaptation. Downstream processes, including testing and change approval, have to scale to handle the new code volume. Manual review gates that were sufficient for human-written throughput become bottlenecks.

The visual is exactly what the name suggests. Performance dips, then rebounds higher than the starting baseline. The 2026 report’s argument is that this trajectory is well-documented across major organizational transformations (continuous delivery, platform engineering) and applies cleanly to AI adoption.

What the empirical record shows

The J-curve as a framing concept is from the 2026 report. The empirical anchor for it is the 2025 DORA State of AI-assisted Software Development report, which the 2026 report cites repeatedly. The 2025 findings, distilled:

  • Increased AI adoption was associated with an increase in individual effectiveness. Strongest effect of any outcome studied.
  • Increased AI adoption was associated with an increase in software delivery throughput. Teams shipped more changes in less time.
  • Increased AI adoption was associated with an increase in software delivery instability. The 2025 report’s exact phrasing, which is sharper than most coverage of it: “AI adoption not only fails to fix instability, it is currently associated with increasing instability.”
  • Increased AI adoption was associated with an increase in team performance across multiple dimensions.

That third bullet is the one most reliably absent from the marketing. The other three are real, the data supports them, and they are good news. Instability also rose. Both throughput and instability rose. The J-curve framing is the 2026 report’s way of saying: we expect that, the dip is the dip, plan for it.

The shape is more interesting than “AI hurts you” and more interesting than “AI helps you.” Throughput goes up. Stability goes down. Both at once. The verification tax is the mechanism that ties the two: more code at higher velocity, against review systems that were calibrated for slower hand-authored throughput.

What the two reports prescribe together

The 2026 ROI report builds directly on the 2025 State of AI-assisted Software Development report. The 2025 report introduced the DORA AI Capabilities Model: seven capabilities that, when paired with AI adoption, amplify its positive impact on team and organizational outcomes. The capabilities are a clear AI policy, a healthy data ecosystem, AI-accessible internal data, quality internal platforms, strong version control practices, working in small batches, and a user-centric focus.

The 2026 report’s J-curve mitigation list draws from a mix of sources. From the AI Capabilities Model: working in small batches, version control, AI-accessible internal data. From the broader DORA Capabilities Catalog: test automation, continuous integration. Layered on top: operational guardrails like pre-commit hooks paired with static analysis, mandatory synchronous security reviews, and architectural decision records. The 2025 report names the capabilities that determine whether AI amplifies your team’s existing strengths or its dysfunctions; the 2026 report puts a price tag on the dip and gives finance leaders a budgeting model. The two work as a pair: here are the capabilities that flatten the curve, here is what the dip costs while you build them.

I want to be careful not to read either report as adversarial to my own argument. They are useful documents in different ways. The 2025 model is more proactive than the J-curve framing alone suggests: three of its seven capabilities (AI-accessible internal data, healthy data ecosystem, quality internal platforms) are about the systems context AI operates in, which is upstream of any individual review. That is the right shape of answer.

The piece both reports underemphasize

What the capabilities list does not name directly is the specific operation that lives at the boundary between generation and review: pattern-matching AI-generated work against known interaction failures before a human reviewer sees it.

This is a different operation from “AI-accessible internal data” (which is about giving the AI better context to generate from) and different from “test automation” (which catches errors after the code exists). It is the operation of running a structured catalog of failure modes against a generated diff or design, surfacing the matches, and routing the human reviewer toward those specific risks rather than asking them to read everything with equal attentiveness.

When that operation exists, the cognitive load of the verification tax drops because the review becomes structured rather than open-ended. A reviewer confirming that a flagged retry-storm pattern is or is not present is doing cheaper work than a reviewer scanning open-ended for “anything that looks off.” The cost difference compounds at AI volume.

The reason this operation does not show up cleanly in the DORA model yet is that the data hasn’t accumulated for it the way it has for, say, version control. Pre-production pattern matching at the scale AI-assisted development demands is a relatively new practice. There is a body of evidence in adjacent disciplines that suggests it works: STPA in safety-critical systems (Google SRE has reported on adapting it for software at SRECon), threat-modeling in security. What is missing is the longitudinal study against AI-era delivery metrics.

The 2025 report itself is more candid about this gap than the 2026 ROI report is. After laying out its findings on instability, the authors write:

Perhaps these technical capabilities are more vital than ever, demanding even stricter adherence to their principles. However, it may be that even this is not enough. Maybe the evidence points toward a more disruptive conclusion: These technical capabilities and measurements no longer suffice. They must evolve for the AI era, be replaced, or be supplemented.

That is a remarkably honest concession for a report whose main job is to reinforce the value of those capabilities. The argument I am making here is the narrow version of that broader concession: one of the things the existing capabilities need to be supplemented with is the operation that runs at the boundary between generation and review.

If you are reading the 2026 report and budgeting from its defaults, the variable that will shift your J-curve depth and duration the most is how your review system handles the new code velocity. The reports give you the framework. The work that makes the framework’s defaults match your reality is not all in the capabilities list yet.

The vibe-coding tax

Vibe coding names the moment AI-assisted engineering stops being engineering. The vibe-coder prompts an agent, accepts the output without close review, and ships. The act treats AI-generated code with the trust calibration of code the developer wrote themselves, except that they did not write it and do not have the muscle memory to know where it is wrong.

The risk this creates is statistical. As reviewers across a team get more tired and process more diffs that look correct, the rate of rubber-stamp reviews drifts upward. Regression to the mean of attentiveness is the structural reality, and the mean of attentiveness drops as volume rises. The verification tax does not vanish when the original author skips it; it gets paid downstream by a person who is more tired, against code with a higher rate of subtle interaction failures than the team’s review culture is calibrated for. Aggregate that across months and the J-curve dip is deeper and longer lasting than the report’s defaults assume, because the defaults assume the discipline is intact and consistent.

AI-assisted engineering is software engineering: the engineer reviews what they ship, including the parts the agent wrote. Vibe coding is the anti-pattern, the moment the discipline silently lapses. Both DORA reports’ capabilities lists assume the discipline is in place. They do not, and probably should not, prescribe against its absence.

What this means in practice

What the 2026 report names well is the cost. What both reports name less directly is the shape of the work that makes the dip shorter and the rebound real. “Tighten CI, shrink batches, ship more tests” is the existing best-practice list run faster, which is correct as far as it goes and not enough on its own. The teams that flatten the curve fastest are doing three things that go beyond it. None of them are cheap to build. Some companies never get there.

  1. Pattern-match generated work against known failures, faster and more automatically. Maintain a structured catalog of the interaction failures the team has lived through, plus failures from teams with similar architecture published in postmortems. When AI generates code or design, run it against the catalog before a human reviewer sees it. The reviewer’s job becomes confirming flagged risks instead of finding them, which is a fundamentally cheaper cognitive operation.

  2. Learn, share, and apply context proactively. Treat the artifacts every review produces (control structures, accepted risks, named failure modes, “we’ll watch this”) as inputs to the next review, not as exhaust. The 2025 capabilities model gestures at this with “AI-accessible internal data” and “healthy data ecosystem,” but the operation is more specific than data accessibility: write down what your systems actually fail on, in a form that the next review can consume without re-deriving.

  3. Build continuous learning loops between incidents and reviews. When something goes wrong in production, the postmortem feeds the catalog. The catalog feeds the next review. The review prevents the next class of incidents. This loop is what turns one team’s bad day into the whole organization’s improved review surface, and turns the broader public corpus of postmortems into your own team’s institutional knowledge.

These three moves are proactive in a way that test automation and CI are not. They change the cost of review by changing the operation, not by adding more passes through a faster pipeline. They are how a team with AI-assisted development stays structurally ahead of the volume the AI is producing instead of catching up to it.

The honest part is that they are not cheap. Building and maintaining a useful catalog requires sustained discipline. Running the loop requires somebody (or some team) to own it. Some companies will not get there at all, and the J-curve will be real for them in a way the report’s defaults underprice. The teams that do get there will see a different curve.

Where this leaves me

The 2025 and 2026 DORA reports are useful in different ways. The 2025 report names the seven capabilities that determine how much of AI’s value a team actually realizes. The 2026 report puts a price tag on the dip and gives finance leaders a budgeting model. Read together, the picture is coherent: the J-curve dip is real, the capabilities that flatten it are nameable, here is what the dip costs while you build them. The framing is going to be in every board deck for the next year. On net that is good, because it pushes the conversation past “AI productivity boost” into something more honest.

What I would not want is for the report’s defaults to harden into the answer. The 15% productivity drop and three-month duration are model inputs, not universally applicable. They depend on what kind of work your team is doing during the dip. Teams investing in proactive pattern-matching, shared context, and continuous learning loops will land closer to the floor of the curve and rebound sooner. Teams that are mostly tightening CI and reacting faster will land closer to the model’s defaults or below them.

If you are budgeting for an AI adoption initiative right now and using DORA’s calculator, sense-check the J-curve drop assumption against the kind of work your team is actually doing during the dip. Reactive work flattens the curve more slowly than proactive work.

We are building Revelara to make proactive reliability efforts cheap and routine. The shape of the post-AI engineering organization is not yet settled, but shape of the verification tax is solidifying.

Resources