Agentic Coding Doubles PRs, But Merge Rates Are Falling

The most important signal in enterprise engineering right now is not that AI is writing code. It's that AI is writing code that humans are rejecting at a rate nobody planned for.

New data from Jellyfish's agentic coding benchmark, covering roughly 250,000 developers and 40 million data points, puts a number on the tradeoff that engineering leaders have been feeling but couldn't quantify: teams using AI coding tools are shipping approximately 2x as many pull requests as they did before AI adoption, but merge rates on AI-generated PRs have dropped from roughly 80% to 60% as agents take on a larger share of total diffs. That 20-point drop is not a rounding error. It's a structural shift in how throughput translates to shipped value, and it has direct implications for how you staff, what you measure, and where you put your senior engineers.

If your staffing model assumes that more PRs equals more output, you're running the wrong equation.

What the Data Actually Shows

Jellyfish's longitudinal dataset, tracking more than 700 companies, 200,000 engineers, and 20 million pull requests, establishes a few benchmarks worth anchoring your planning to. Top-quartile AI adopters achieve roughly 2x PR throughput compared to low adopters, with approximately a 24% decrease in PR cycle times as organizations move from 0% to 100% AI coding tool adoption. Median AI coding tool adoption has climbed from about 22% of coding time to nearly 90%. Nearly half of all companies in the dataset now generate 50% or more of their code with AI assistance. The agentic layer is accelerating fast. Jellyfish reports a 4.5x increase in companies piloting workflows where AI autonomously authors commits or opens reviews. Agentic AI use for coding tasks jumped from just over 50% of companies to 82% between January and May 2025 alone. But the quality signal runs in the opposite direction. AI-heavy teams are shipping PRs that are about 18% larger in net lines of code, driven mainly by additions rather than deletions. Revert rates are creeping upward into the 7 to 11% range. No statistically significant correlation between AI usage rate and new bug creation has emerged yet, but revert trends are moving in the wrong direction for teams without formal quality controls in place. The implication is direct: agentic AI behaves like a high-variance junior engineer at massive scale. Volume goes up. Quality consistency goes down. The teams winning this transition are the ones who redesigned their workflows to expect exactly that.

The Staffing Model Most Teams Are Running Is Already Broken

The legacy staffing assumption in most engineering organizations is a roughly uniform distribution of IC levels: a few senior engineers, a larger base of mid-level engineers doing most of the implementation, and some junior engineers handling smaller tasks. AI was supposed to compress that pyramid from the bottom. What's actually happening is more disruptive. The middle is collapsing faster than anyone predicted, and the top is getting more valuable faster than most teams have compensated for. Here's what the Jellyfish data implies for team structure:

Role Type	Pre-Agentic AI Model	Agentic AI Model
Junior IC (implementation)	High headcount, ramp time	Replaced or heavily reduced by agents
Mid-level IC (feature work)	Core of most teams	Shifting toward editor/curator function
Senior IC / Staff (architecture, review)	Understaffed relative to output	Critical bottleneck, needs to scale up
AI Enablement (prompt ops, repo config)	Did not exist	Emerging as formal staff+ or EM role
QA / Reliability	Often under-resourced	Higher strategic priority given revert trends

The merge rate drop from 80% to 60% on AI-generated PRs is not a tooling problem you wait for vendors to solve. It's a governance problem you solve by explicitly allocating senior engineering time to act as editors and system architects, reviewing a rising volume of AI-authored diffs for architectural coherence, test coverage, and production risk. Teams that do not make this allocation will see throughput numbers that look impressive in sprint reviews and defect rates that quietly compound in production.

The Organizational Advantage Nobody Is Talking About

Most coverage of this data fixates on the headline throughput numbers or on fear-based narratives about headcount reduction. Both miss the real strategic opportunity. The Jellyfish data shows that top-quartile AI adopters dramatically outperform laggards on both throughput and cycle time. That gap is not primarily a tooling gap. Claude Code, Devin, and Codex are available to every engineering organization. The differentiator is workflow design and governance, not tool access. Companies that build formal AI enablement functions, standardize repo-level guardrails, and adjust incentive structures away from raw PR volume toward high-quality merged impact can safely absorb lower per-PR merge rates while still compounding the throughput advantage. Companies that adopt AI ad hoc, without governance, will accumulate technical debt and revert rates that eventually cancel out the throughput gains. The organizational pattern emerging at high-performing teams looks less like a traditional engineering org and more like a newsroom. You have a large volume of AI-generated "drafts" flowing in constantly. You have a smaller, highly skilled editorial layer that reviews, curates, and publishes. And you have a governance layer that sets the standards the whole system runs against. Full autonomy is still extremely early. Only about 8% of companies in the Jellyfish dataset are piloting fully agentic workflows where AI writes and submits code end-to-end, and autonomous-agent PRs account for less than 2% of merged PRs over recent measurement periods. The 98% is still human-in-the-loop territory. That means the editorial layer is not a future concern, it is the current bottleneck.

A Framework for Restructuring Around Agentic Output

Engineering leaders who want to capture the throughput advantage without the quality regression need to make changes across three dimensions simultaneously: roles, metrics, and guardrails.

Roles: Redesign Around the Editor Function

Stop hiring generalist ICs expecting them to figure out how to work with agents. Start hiring explicitly for what the agentic era requires.

Reduce junior IC headcount on teams where agents can handle rote implementation, boilerplate, and backlog bug tickets. Redirect that budget toward senior and staff-level engineers.

Create a formal AI Enablement role at the staff+ or EM level. This person owns prompt templates, repo-level configurations for tools like Claude Code and Codex, CI policy tuning, and ongoing quality measurement. This is not a part-time responsibility.

Structure senior IC time explicitly around edit and review cycles for AI-generated diffs, not just human-authored code. If your senior engineers are still spending the majority of their review time on human-written PRs, your workflow is not calibrated for current adoption rates.

Metrics: Retire Raw PR Count as a Performance Signal

If your engineering dashboards still feature PR volume as a primary output metric, you are measuring the wrong thing. With 2x PR throughput and 60% merge rates, raw PR count has become a vanity metric. Replace it with:

•
Merged PR impact (story points, feature scope, or business metric movement per merged PR)
•
Post-merge defect rate per engineer and per team
•
Revert rate trends by codebase, team, and AI tool configuration
•
Cycle time to successful merge, not cycle time to PR open

Tying AI tooling spend to measurable improvements in successful merge rates and incident trends, rather than gross code volume, also gives you a defensible ROI model when finance asks why your Codex or Claude Code contract costs what it does.

Guardrails: Invest Before You Scale Agentic Usage

The revert rate data is a leading indicator. Teams that scale agentic usage without investing in guardrails first will see those revert rates move from the 7 to 11% range into territory that starts to materially slow down production reliability. Concrete investments that high-performing teams are making now:

•
Richer test automation requirements before any AI-generated PR can be opened for review
•
Stricter CI policies with mandatory coverage thresholds on AI-authored diffs
•
Repo-level configuration files that constrain agent scope and flag autonomous changes to sensitive paths
•
Weekly review cycles on merge rate and revert trends by codebase, with the AI Enablement role accountable for tuning

What This Means for Your Next Hire

The staffing implications here cut against how most engineering leaders are currently budgeting. The instinct, when AI doubles throughput, is to freeze headcount or cut. That instinct is wrong for any organization with ambitious product goals. Individual teams are getting smaller and more elite. A team that previously needed 12 engineers to manage a product surface can operate with 6, augmented by agents handling the implementation volume. But the organizations winning in this environment are not shrinking their overall engineering investment. They are redeploying it: launching more products, attacking larger surfaces, building systems they could not previously staff. The companies with fewer engineers overall are the ones with smaller ambitions.

The engineers you need more of are the ones who can operate as the editorial and architectural layer above agentic output. Those engineers are not common, and traditional hiring platforms were not built to find them. They were built to filter for years of experience with specific frameworks, which is nearly irrelevant in a world where frameworks get generated on demand. What matters now is judgment, system thinking, and the ability to evaluate and integrate code you did not write at high volume under time pressure.

Finding that profile requires a fundamentally different signal than a keyword-matched resume screen. That is the gap Nextdev was built for: identifying the AI-native engineers who thrive in agentic environments, not just the engineers who have the most lines of code on their GitHub profile.

The Window to Build a Durable Advantage Is Now

The Jellyfish data makes clear that the gap between top-quartile and bottom-quartile AI adopters is already measurable and widening. Two times PR throughput, 24% faster cycle times, and more ambitious project scope are not theoretical future benefits. They are current operational realities for the teams that have done the governance work. The teams that will own this transition by the end of 2026 are the ones that treat the merge rate drop not as a reason to slow down AI adoption, but as the precise diagnostic that tells them where to invest next: more senior engineers in editor roles, formal AI enablement functions, and metrics that reward merged quality over raw volume. The throughput is already there. The organizational infrastructure to sustain it at quality is where the work is.

Nextdev