Jellyfish just published the most comprehensive benchmark on AI coding in production, and the headline is not what most people are reporting. Across 250,000 developers, 40 million data points, and roughly 1,000 enterprise companies, AI-assisted pull requests have doubled in volume while the merge rate for those PRs has dropped from ~80% to ~60%. That gap is where your AI strategy either works or quietly bleeds out. For engineering leaders, this is not a reason to slow down AI adoption. It is a precise diagnosis of where the investment is leaking, and it tells you exactly what to fix.
The Numbers You Actually Need to Know
Jellyfish's dataset is the largest of its kind in production engineering, and the signal is unusually clean. Here is what the data shows:
| Metric | Human-Authored PRs | AI-Assisted PRs |
|---|---|---|
| Merge rate | ~80% | ~60% |
| PR volume change | Baseline | ~2x increase |
| Cycle time | Baseline | ~24% faster |
| Avg PR size | Baseline | ~18% larger (net lines) |
| Bug-related PRs | 8-9% | 8-9% (flat) |
The 260% year-over-year increase in AI coding usage from June 2024 to May 2026 means this is no longer a pilot program phenomenon. The share of PRs with high AI involvement jumped from 14% to 51% in roughly 12 months. AI is now a structural input to how your team ships code, whether you have a formal strategy for it or not. The weekly active AI user rate across the 250,000-developer panel sits at a median of 71%, with the 90th percentile hitting ~90%. Your engineers are already using these tools heavily. The question is whether your review infrastructure, test coverage, and role design are built to handle what that actually produces.
The Merge Rate Drop Is a Signal, Not a Crisis
The most misread number in this dataset is the 20-point drop in AI PR merge rates. The reflexive take is that AI code is lower quality. That is the wrong frame. A more accurate read: AI is generating more candidate solutions faster, and your existing review and CI systems are correctly filtering a larger share. Bug-related PRs and rollback rates have stayed flat at 8-9% across all adoption levels. That means the quality that reaches production is holding. The PRs that are not merging are being caught before they matter. But "being caught" has a cost. Every abandoned PR represents reviewer time, CI cycles, and cognitive load. If your team is generating 2x the PRs and only 60% are merging, your review capacity is absorbing a workload that is growing faster than your output. That is the real pressure point: not code quality, but review throughput. The teams that will pull ahead are the ones that treat the 40% non-merge rate as a refinement signal. Better prompts, clearer coding standards, stronger automated gates, and more disciplined agent orchestration can move that 60% toward 75% or higher without throttling volume. That is compounding leverage.
The Four-Agent Wall Is Your Org Design Problem
One of the most actionable findings in Jellyfish's agentic coding research is the coordination ceiling. Developers working with multiple AI agents hit a practical wall at roughly four concurrent agents, and even with a single agent, they spend approximately 80% of their time supervising and coordinating rather than writing code. This is not a model quality problem. It is a cognitive architecture problem, and it has direct implications for how you structure roles. The current industry framing of AI benefits as "your engineers code faster" is incomplete. The more accurate frame is: AI agents generate code, and your engineers supervise, review, and orchestrate those agents. That is a fundamentally different job profile. The skills that make someone an excellent ticket-completor with a low cycle time are not the same skills that make someone an excellent agent supervisor with a high merge-rate yield. Jellyfish's data also points to an architectural dependency that most leaders are not accounting for: AI ROI is materially stronger in centralized or well-modularized codebases. Distributed, highly coupled architectures constrain how effectively agents can operate, because the context window a developer needs to supervise the agent's output grows proportionally with code ownership ambiguity. This means your architecture modernization roadmap is now also your AI productivity roadmap.
What This Means for Headcount and Hiring
The Jellyfish benchmark is the clearest data yet that AI is not reducing the value of strong engineers. It is concentrating value at the top of the distribution. The 2x PR volume means your existing engineers are doing more. But the 40% non-merge rate and the four-agent coordination ceiling both point to the same bottleneck: review capacity, system judgment, and architectural clarity. These are senior skills. They do not come from junior engineers with good Copilot prompts. The hiring implication is direct:
The marginal value of a senior engineer who can review AI-generated code rigorously and design modular systems that agents can navigate effectively is higher than it has ever been.
The marginal value of an engineer whose primary function is low-complexity ticket throughput is declining, because agents are replacing that function faster than any other.
Teams that add AI tooling without upgrading the reviewer-to-contributor ratio will hit the merge rate floor and stay there.
This is the Navy SEAL dynamic playing out at scale. Individual product teams shrink and become more elite, while engineering organizations as a whole expand to tackle more ambitious product surface area. The companies with fewer engineers in 2026 are the ones with small ambitions, not the ones with good AI tooling. Traditional hiring platforms were not built to surface this distinction. Filtering for "uses Copilot" or "has agentic experience" in a keyword search is not the same as identifying an engineer who can own a PR review queue at 2x volume and design for agent-friendly modularity. The signal you need is behavioral and contextual, not credential-based.
Where Your Budget Needs to Shift
Most teams that have rolled out GitHub Copilot or Cursor have made the tooling investment but not the adjacent investments that determine whether the tooling pays off. Jellyfish's data is a clear roadmap for where the rest of the budget needs to go: Test automation and quality gates. A 2x PR stream cannot be absorbed by a review process that was designed for 1x volume. Automated test coverage, static analysis, and CI quality gates need to be upgraded in parallel with AI tool rollout. This is not optional infrastructure; it is the filter that makes the 60% merge rate tolerable and improvable. Platform engineering. The teams seeing the strongest AI ROI are the ones with clean, well-modularized codebases. Platform engineering investment that simplifies internal APIs, clarifies code ownership, and reduces coupling is now directly correlated with AI productivity gains. Frame this to your board as AI enablement, not as technical debt. Developer education on agent orchestration. The four-agent coordination wall is partly a tool design limitation, but it is also a skill gap. Engineers who know how to structure tasks for agents, write effective prompts, and review AI output efficiently will consistently outperform those who treat AI as an autocomplete. This is a training investment with a measurable return. Review tooling. Products like Graphite, LinearB, and Jellyfish's own engineering analytics platform are designed to make high-volume PR environments manageable. If your review process is still a GitHub notification queue and a Slack channel, you are going to lose the throughput gains to coordination overhead.
The Competitive Clock Is Running
The 260% year-over-year increase in AI PR usage means your competitors are already navigating this. The teams that treat this data as a warning are behind. The teams that treat it as an optimization map are ahead. Here is the specific competitive risk: if a peer company in your category has invested in test automation, platform simplification, and senior engineering hiring alongside their AI tooling, they are compounding a 2x PR volume and a 75%+ merge rate. You might be compounding a 2x PR volume and a 55% merge rate. Over 12 months, that gap in net shipped output is significant. The Jellyfish data is also a useful counterargument to cost-cutting framed as AI efficiency. Replacing senior engineers with AI agents and expecting quality to hold is not supported by the data. The merge rate drop at scale happens precisely because human review capacity has not kept pace with AI generation capacity.
Three Actions for Engineering Leaders This Month
Audit your current AI PR merge rate. If you are not tracking merge rates segmented by AI-assisted versus human-authored PRs, you have no visibility into where the productivity gains are leaking. Jellyfish, LinearB, or even a custom GitHub query can get you this data within a week. Set a baseline before you make any other decisions.
Reframe your next senior engineering hire around review leverage. Your next few hires should be evaluated explicitly on their capacity to review AI-generated code rigorously, identify architectural drift, and design systems that agents can navigate cleanly. These are distinct competencies from raw coding speed, and most job descriptions do not screen for them. Update your criteria before you post the role.
Tie your architecture modernization roadmap to your AI tooling budget. If you are spending on Copilot, Cursor, or agentic coding infrastructure, the ROI on that spend is partially determined by how modular and well-owned your codebase is. Present a joint plan to your board: AI tooling investment and platform engineering investment as a paired line item, with a shared productivity metric as the output target.
The Jellyfish dataset is the most important empirical document on AI coding in production right now. The teams that read it as "AI is not working" will underinvest at exactly the wrong moment. The teams that read it as "here is precisely where to invest next" will look back on 2026 as the year they separated from the field.
Want to supercharge your dev team with vetted AI talent?
Join founders using Nextdev's AI vetting to build stronger teams, deliver faster, and stay ahead of the competition.
Read More Blog Posts
Smaller Core Teams Are Winning. Here's Why.
The counterintuitive hiring insight that most engineering leaders are missing in 2026: your next great hire probably won't write much code. They'll supervise th
Agent Supervisors: AI Rewrites the Engineering Job
Here is the most important number in software engineering right now: 75%. That's the share of Google's code now written by AI systems, according to internal lea

