Claude Opus 4.7 xhigh: What's New and What It Costs You

Anthropic pushed Claude Code 2.1.111 on April 20, 2026, and the headline feature is immediate: Claude Opus 4.7 is now available in Claude Code with a new `/effort` command that unlocks an xhigh effort level, and Auto mode is live for Max subscribers. This is the first time engineering teams can dial up reasoning intensity directly from the CLI without touching a single API parameter. For teams running agentic coding workflows, that's a meaningful control surface. But there's a catch in this release that deserves your full attention before you migrate.

What Actually Shipped in 2.1.111

Three things matter in this changelog:

Opus 4.7 is now the default available model in Claude Code, replacing Opus 4.6 for teams on standard tiers

`/effort xhigh` is a new command flag that instructs Opus 4.7 to allocate maximum reasoning budget before responding, specifically designed for complex multi-file refactors and architectural decisions

Auto mode is now available for Max subscribers, letting Claude dynamically select effort levels based on task complexity rather than requiring manual toggles

The `/effort` command is the workflow change you'll feel immediately. Previously, effort calibration happened implicitly through prompt engineering or model selection. Now you can invoke it explicitly:

bash

/effort xhigh "Refactor the authentication module to support OAuth 2.1 and PKCE across all 14 service endpoints"

That's a cleaner interface than the workarounds teams were using before. Auto mode takes it further: Opus 4.7 reads task complexity and self-selects the appropriate effort tier, which reduces the cognitive overhead of deciding when to push the model hard.

Opus 4.7: The Benchmark Story

Released April 16, 2026, Opus 4.7 is Anthropic's production-tier model sitting between Opus 4.6 and the restricted Claude Mythos Preview. The number that engineering leaders should care about: a 10.9 point improvement on SWE-bench Pro over Opus 4.6, which measures real-world GitHub bug-fixing performance. That's not a marginal gain. For comparison, OpenAI's o3 sits at 8.7% on standard SWE-bench. Anthropic is claiming a genuine lead on the task that matters most for agentic coding pipelines: finding and fixing bugs in production code.

Other capability upgrades in 4.7:

•
Maximum image resolution increased from 1568px (1.15MP) to 2576px (3.75MP), which matters for teams doing visual regression testing, UI review automation, or diagram-to-code workflows
•
Task budgets for agentic workflows, giving you explicit control over how many iterations an agent loop can run before halting
•
Adaptive Reasoning replaces manual extended thinking toggles, letting the model self-calibrate

The vision upgrade is underrated. Teams running automated UI testing or feeding architecture diagrams into coding agents have been hitting resolution limits. The jump to 3.75MP opens up workflows that were previously impractical.

The Tokenizer Problem You Can't Ignore

Here's where this release gets complicated. Opus 4.7 ships with a new tokenizer that inflates token counts by up to 35% for code, JSON, CSV, and non-English text compared to Opus 4.6. Pricing is unchanged at $5 per million input tokens and $25 per million output tokens, but when your codebase suddenly tokenizes 35% heavier, that's an effective price increase that Anthropic didn't advertise. The community noticed immediately. A Reddit post titled "Opus 4.7 is not an upgrade but a serious regression" collected 2,300 upvotes within 48 hours. An X post on the same theme hit 14,000 likes. That's not noise. That's a signal that developers running code-heavy workloads felt the cost change before they saw any capability benefit. The backlash deserves context, though. ArtificialAnalysis data shows approximately 10% net cost savings from fewer output tokens in evals, meaning Opus 4.7 produces more concise responses. For teams running prose-heavy workflows, NLP pipelines, or documentation generation, the economics may actually improve. The pain is concentrated in code, JSON, and structured data inputs, which is exactly what most engineering teams are pushing through Claude Code. This is the honest breakdown:

Workflow Type	Token Impact	Net Cost Direction
Code-heavy (refactors, reviews)	Up to +35% input tokens	More expensive
JSON/CSV data processing	Up to +35% input tokens	More expensive
Non-English text	Up to +35% input tokens	More expensive
Prose, documentation, NLP	Minimal input inflation	Potentially cheaper
Concise reasoning tasks	Fewer output tokens	Cheaper overall

If your Claude Code usage skews toward large codebase context windows and multi-file operations, model this cost change before committing to Opus 4.7 as your default.

Competitive Position: Where Anthropic Stands Right Now

The SWE-bench Pro lead is real, and it positions Claude as the strongest production option for agentic bug-fixing specifically. But the full landscape looks like this in April 2026:

•
Against OpenAI o3-pro: Anthropic leads on SWE-bench; o3-pro remains competitive on multi-file generation. For pure coding agent tasks, Opus 4.7 xhigh is the more defensible choice right now.
•
Against Google Gemini 2.5: Gemini holds advantages in very long context windows and multimodal throughput. For architecture-level reasoning across massive codebases, this is still a genuine comparison worth running internally.
•
Against Grok 4: The tokenizer inflation is Anthropic's most exploitable weakness here. If Grok 4 maintains cost stability on code-heavy inputs while delivering comparable benchmark performance, cost-sensitive teams have a real alternative. Monitor this closely over the next 30 days.
•
Against Claude Mythos Preview: Opus 4.7 lags Mythos Preview on broad capabilities and cyber tasks. If you have access to Mythos Preview and are running security tooling or complex multi-agent architectures, you shouldn't be switching down to 4.7 for those workflows.

The xhigh effort level is Anthropic's attempt to give teams a practical way to access Opus 4.7's ceiling capabilities without waiting for Mythos Preview to become generally available. It's a smart bridge strategy.

What Engineering Leaders Should Do Right Now

The 2.1.111 release has genuine value, but it requires deliberate evaluation rather than a blanket migration. Here's the priority order: This week:

Benchmark Opus 4.7 on your actual internal bug-fixing and refactor workflows. The SWE-bench Pro gain is real, but your codebase is not SWE-bench. Run the comparison against your team's representative tasks.

Measure token inflation on your code-heavy prompts. Pull a sample of your last 30 days of Claude Code usage, re-run those prompts against Opus 4.7, and compare token counts. Quantify the actual cost delta before committing.

If you have Max subscribers on your team, allocate them to test Auto mode and xhigh effort on your most complex refactoring tasks immediately. These are the workflows where the capability improvements justify the cost.

Next 30 days:

Integrate task budgets into your agentic pipelines. The ability to cap agent loop iterations is underutilized infrastructure. Teams that implement tight task budgets now will have more predictable costs as they scale agentic workflows.

Run a parallel evaluation against o3-pro or Gemini 2.5 on multi-file generation tasks. Opus 4.7 wins on bug-fixing; the comparison is less clear on greenfield generation at scale.

Watch Anthropic's response to the tokenizer backlash. The volume of community pushback suggests Anthropic will address this. Teams that wait 30 to 60 days may see a tokenizer revision that changes the economics significantly.

Hold pattern: If your primary workload is code review and refactoring on large codebases with significant JSON or structured data context, the tokenizer inflation may offset the capability gains in the short term. You're not missing a step-change by staying on Opus 4.6 for another few weeks while the cost picture clarifies.

The Bigger Signal Here

The `/effort xhigh` command is the architectural tell in this release. Anthropic is building explicit effort control directly into the CLI because the teams that will define the next generation of software engineering are running agents, not just autocomplete. They're assigning entire workflow phases to AI systems that need to reason deeply before acting. The Navy SEAL framing applies here: a team of 5 engineers using Claude Code with xhigh effort on complex refactors can outproduce a team of 50 running brute-force manual code review. But those 5 engineers need the tooling to be right. The SWE-bench Pro lead Anthropic is claiming is evidence they understand what that team actually needs. The tokenizer inflation is evidence they still have operational kinks to work out. The companies that win the next 24 months of software development won't be the ones that waited for perfect tooling. They'll be the ones that built internal evaluation discipline, hired engineers who know how to leverage agents effectively, and adapted their workflows fast enough to compound the capability gains before competitors did. Opus 4.7 with xhigh effort is a meaningful step in that direction. Measure the cost impact, run the benchmarks, and move.

Nextdev

Claude Opus 4.7 xhigh: What's New and What It Costs You

What Actually Shipped in 2.1.111

Opus 4.7: The Benchmark Story

The Tokenizer Problem You Can't Ignore

Competitive Position: Where Anthropic Stands Right Now

What Engineering Leaders Should Do Right Now

The Bigger Signal Here

Want to supercharge your dev team with vetted AI talent?

Read More Blog Posts

AI Tools Weekly: Claude Code Ships 5 Versions in 7 Days

Claude Design Is Anthropic's First Labs Product. Here's Why It Matters.