Nextdev

Nextdev

Codex 26.616: AI Just Learned to Watch and Work

Codex 26.616: AI Just Learned to Watch and Work

Jun 19, 20267 min readBy Nextdev AI Team

OpenAI shipped Codex app version 26.616 on June 18, 2026, and it is a more significant release than the version number suggests. The headline feature is Record & Replay, a macOS-only capability that lets you demonstrate a workflow once and have Codex turn it into a reusable, AI-driven automation skill powered by Computer Use. That single sentence does not do justice to what changed. This is not an incremental code-completion improvement. This is Codex stepping directly onto the turf occupied by UiPath, Microsoft Power Automate, and Zapier, but with a fundamentally different architecture: natural language learning from live demonstration instead of flowchart design or scripted step sequences.

Engineering leaders who treat this as just another desktop update are misreading the signal. Here is what shipped, what broke, and what you should do about it.

What Actually Shipped in 26.616

The core addition is Record & Replay. You perform a workflow on your Mac. Codex observes it via Computer Use. The agent encodes what it watched into a reusable skill that can be triggered, parameterized, and replayed without you touching the keyboard again. This is robotic process automation (RPA) rebuilt from the model layer up. Traditional RPA tools require you to map a workflow explicitly: define the steps, identify the UI selectors, handle exceptions in a visual editor. Record & Replay skips that entirely. You show it, Codex figures out the abstraction, and the skill becomes callable. That is a genuinely different surface area. A few important constraints at launch:

  • Record & Replay requires Computer Use to function. No Computer Use access, no feature.
  • It is currently unavailable in the European Economic Area, the United Kingdom, and Switzerland. This is a regulatory-driven limitation, not an oversight. OpenAI is managing data-processing exposure before shipping cross-app computer control into GDPR and UK GDPR jurisdictions.
  • The feature is macOS-only at launch. Windows teams cannot access it yet.

The geographic restriction is itself a signal. When a company geo-gates a feature at the regulatory border before launch rather than after, it means legal and policy teams reviewed Computer Use specifically as an automation runtime, not just a chat capability. That framing matters for how you deploy this internally.

The Breakage in 26.616 Is Real and Affects Multiple Layers

Record & Replay gets the press, but GitHub issue #28978 is what your platform team should be looking at right now. Multiple users report that after the auto-update to 26.616.30709 on June 18, starting any new conversation or new project fails immediately with an "Invalid request" toast error. The session creation flow is broken for affected users, which means the update that introduces the headline feature simultaneously breaks the most basic interaction in the app for a subset of deployments.

Two additional regressions compound this:

Third-party model providers are partially broken. Issue #28957 documents that the new desktop home view and session entry flow silently breaks session history display when `model_provider` is set to a non-OpenAI provider. If your team routes Codex through Azure OpenAI, Anthropic, or any other provider via the model-provider abstraction, your session history may not render correctly and some configurations will block third-party models entirely.

Security tools are flagging automation behaviors. Issue #28960 captures multiple reports of Malwarebytes blocking what it classifies as remote control of a Windows PC, and Bitdefender repeatedly blocking a PowerShell command invoked by 26.616.30709. The security tooling is responding to new automation surface area in the update, not to malicious code, but the blocks are real and will affect teams with endpoint protection running on developer machines.

Taken together, these three issues paint a picture of a release that shipped quickly and touched more of the stack than a typical point release. That is not a reason to avoid 26.616, but it is a strong reason to control how and when you roll it out.

Competitive Positioning: Codex vs. RPA vs. Copilot

The release repositions Codex in the competitive landscape in a way that most coverage is missing. The comparison matrix has expanded.

CapabilityGitHub CopilotUiPath / Power AutomateCodex 26.616
Code generation in editor
Multi-step agentic tasks
Learn workflow from demo
No flowchart/scripting required
Cross-app computer control
Available in EEA/UK now
Windows support

GitHub Copilot remains the dominant coding assistant by install base, but it does not do cross-app computer control. Copilot's agents operate within the IDE boundary. Codex 26.616 steps outside that boundary and into the operating system layer. UiPath and Power Automate dominate enterprise RPA, but their moat is the tooling ecosystem and enterprise contracts, not the underlying architecture. Both require users to design automations explicitly. Record & Replay removes that requirement. In developer-heavy organizations where engineers would rather write a prompt than drag nodes in a visual editor, this is a meaningful differentiator. UiPath's enterprise footprint gives it a long runway, but the architectural gap just widened. The more interesting question is whether Zapier and similar iPaaS tools feel pressure here. Record & Replay, as it matures, could let a developer automate a SaaS workflow without building a Zapier integration at all. That is a longer-horizon threat, but it is directionally real.

This Is an Automation Runtime. Treat It Like One.

Here is the strategic point most coverage will bury under the demo video. Computer Use plus Record & Replay turns the Codex desktop client into a privileged automation runtime. It can observe your screen, click through applications, invoke PowerShell, and replay learned workflows across sessions. That is not a developer productivity tool in the traditional sense. That is an agent with access to everything your user account can touch. The procurement and security implications follow directly:

  • Codex should be treated as a privileged process, not a browser extension. The same access controls you apply to CI/CD service accounts apply here.
  • Audit logging is not optional. If Codex is replaying workflows that touch internal systems, version control, or production tooling, you need a record of what it did and when.
  • Least-privilege scoping matters. Define explicitly what machines, repositories, and internal applications the agent is permitted to interact with. Do not let it inherit your full user-account permissions by default.

The Bitdefender and Malwarebytes incidents in GitHub issue #28960 are an early indicator that enterprise security stacks have not caught up to this model yet. Your security team will flag these behaviors. Get ahead of that conversation by framing Codex as an automation runtime in your security review, not as a chat tool.

Recommendations: Stage This Rollout

The feature capability in 26.616 is real and worth piloting. The regressions are also real and require managed rollout. Here is the practical playbook:

Pin non-pilot machines at the previous version. Codex auto-updates by default. Lock version control at the desktop client level for teams not in your pilot group. The "Invalid request" regression in new conversation creation is severe enough to block daily workflows.

Audit your `model_provider` configuration before updating. If you are using non-OpenAI providers, verify your setup against issue #28957 before rolling out 26.616 broadly. The silent session history breakage means affected users may not realize their configuration is degraded.

Run a security-tooling compatibility pass. Before enabling Computer Use and Record & Replay in your environment, test against your endpoint protection stack. Whitelist the specific Codex behaviors with your security team proactively, rather than responding to incidents after the fact.

Pilot Record & Replay on low-risk, repetitive internal workflows first. The highest-value use cases for this feature are workflows that are tedious, well-understood, and isolated: populating an internal ticket from a design spec, transferring data between tools without an API, running a standard QA checklist in a web app. Start there. Do not pilot on workflows that touch production systems or customer data until you have established your audit and rollback controls.

EEA/UK/Switzerland teams should track the rollout timeline. The geographic restriction is regulatory, not technical. When OpenAI resolves the data-processing compliance requirements for these jurisdictions, the feature will ship there too. Prepare your pilot plans now so you can move quickly when access opens.

The Bigger Picture for Engineering Leaders

Record & Replay is version one of something that will get substantially more capable over the next 12 to 18 months. What OpenAI is building is an agent that learns by watching, then acts autonomously. The current implementation is constrained: macOS only, limited geographic availability, rough edges in the session layer. But the architectural direction is clear. The teams that will extract the most value from this are not the ones who pilot it the fastest. They are the ones who build the governance infrastructure now: access controls, audit logs, approved workflow libraries, and change management processes for agent-driven automation. When Record & Replay matures into something that can own a meaningfully complex cross-system workflow without human supervision, you want that governance layer already in place, not retrofitted.

This is also a hiring signal. The engineers who understand how to design, constrain, and audit agentic automation systems are increasingly rare and increasingly valuable. Building a team with that capability requires finding engineers who think about AI not just as a code-completion layer but as a runtime that needs to be governed. That is a different profile than most teams have hired for historically, and it is exactly the kind of AI-native engineering judgment that separates elite teams from everyone else in this environment.

26.616 is a rough release with a sharp feature at its core. Stage the rollout, fix the security friction, and start building the governance layer. The capability is coming regardless. The question is whether your organization is ready to use it safely when it arrives.

Want to supercharge your dev team with vetted AI talent?

Join founders using Nextdev's AI vetting to build stronger teams, deliver faster, and stay ahead of the competition.

Read More Blog Posts