AssemblyAI

AssemblyAI

AssemblyAI Streaming Gets `agent_context`: Build Smarter Voice Agents

AssemblyAI Streaming Gets `agent_context`: Build Smarter Voice Agents

Jun 13, 20266 min readBy AssemblyAI Blog

AssemblyAI just shipped a feature that quietly changes the calculus for engineering teams building real-time voice agents. The new `agent_context` parameter in AssemblyAI's Streaming API lets you set and update the model's instructions mid-session, without tearing down and recreating your WebSocket connection. It's available now in the Python SDK at v0.64.6+ and the Node SDK at v4.34.x+. If you're building voice bots, meeting copilots, or IVR replacements, this is the update you should have been waiting for. Here's why it matters more than the changelog summary suggests.

What Actually Shipped

The core mechanic is simple but powerful. You can now pass `agent_context` at two points in a streaming session:

At session initialization — set your baseline instructions when the WebSocket connection opens

Via mid-session `reconfigure` calls — update those instructions while audio is actively flowing

That second point is the real news. Before this, if you needed to change the model's behavior during an active call (say, a compliance disclaimer triggered by a keyword, or a new upsell offer activated by a backend event), you had two bad options: bake every possible instruction variant into a single monolithic prompt upfront, or kill and restart the connection and eat the latency and state loss that comes with it. `agent_context` plus `reconfigure` eliminates both workarounds. Here's what a mid-session context update looks like in Python:

python
1import assemblyai as aai
2
3# Initialize with baseline agent context
4transcriber = aai.RealtimeTranscriber(
5    sample_rate=16_000,
6    agent_context="You are a sales assistant for Acme Corp. Focus on our core product line."
7)
8
9transcriber.connect()
10
11# Later, triggered by a backend event — no reconnect needed
12transcriber.reconfigure(
13    agent_context="You are a sales assistant for Acme Corp. A new customer has indicated interest in enterprise pricing. Shift focus to the Enterprise tier and highlight volume discounts."
14)

One long-lived WebSocket. One `reconfigure` call. The model's behavior updates without interrupting the audio stream or losing conversation state.

Why This Is an Architectural Shift, Not Just a Feature

Most coverage of this update will frame it as a prompt-tuning convenience. That framing undersells what's actually happening. `agent_context` plus `reconfigure` is a low-latency control plane for live conversations. At scale, this means you can push policy changes (updated legal language, new promotional scripts, regional compliance requirements) to thousands of active sessions simultaneously, without reconnects, without code deploys, without your ops team holding their breath during a call center's peak hours. Think about what that means concretely for regulated industries. A financial services firm running voice agents across thousands of concurrent calls gets a new compliance requirement mid-day. Previously: either wait for the next connection cycle, or trigger a mass reconnect that spikes your error rates and drops calls. With `reconfigure` and `agent_context`: push the update through your orchestration layer, every active session picks it up, done. This also transforms how you approach A/B testing for voice agents. Want to test two variants of an upsell script against a live traffic split? You no longer need separate deployment environments or session routing logic. You reconfigure session subsets in real time from a single control surface. AssemblyAI is positioning this explicitly for live sales and support assistants, meeting copilots, and voice agents that need to adapt mid-call to new product lines or compliance scripts. The architectural implication is that AssemblyAI's streaming stack is beginning to absorb orchestration logic that teams have historically hand-rolled on top of raw audio APIs.

How This Compares to the Competition

Let's be direct about the competitive landscape. AssemblyAI is not the only player in real-time audio with LLM integration.

CapabilityAssemblyAI StreamingSpeechmatics Real-Time
Sub-300ms latency
Mid-session instruction update (no reconnect)
SDK-level support (Python + Node)
Structured agent_context parameter
Session reconfiguration API

OpenAI's Realtime API is a serious competitor and has significant distribution advantages through the broader OpenAI ecosystem. Deepgram's Aura targets low-latency voice synthesis and has strong traction in conversational AI pipelines. Speechmatics brings enterprise credibility, particularly in accuracy for challenging accents and noisy audio. What none of them currently offer is a first-class, SDK-level mechanism to update model instructions mid-stream without a reconnect. That's the specific gap AssemblyAI is closing, and for teams where mid-call behavior changes are central to product design (upsell triggers, escalation flows, compliance injections, handoff scripts), this isn't a minor convenience. It's a structural advantage in the architecture of the agent itself. The deeper implication: by making `agent_context` a first-class, updatable construct, AssemblyAI is quietly moving up the stack. Specialized voice-agent orchestration platforms have differentiated largely on their ability to manage state and instructions across a live conversation. AssemblyAI is now handling that at the API layer. Teams that have been evaluating dedicated agent orchestration vendors alongside their audio infrastructure vendor should revisit that build-vs-buy question.

Who Should Care Most Right Now

Not every team building on AssemblyAI's streaming API needs to act on this immediately. Here's how to triage: Adopt now if:

  • Your voice agents need to switch behavior mid-call based on backend events (CRM lookups, escalation triggers, product catalog changes)
  • You operate in regulated industries where compliance scripts can change without notice
  • You're running A/B tests on conversational flows and want real-time traffic splitting without architectural complexity
  • You're currently managing mid-session context changes by killing and recreating WebSocket connections

Evaluate and plan if:

  • You're building meeting copilots or async transcription workflows where mid-session updates aren't critical yet
  • You're in early stages and your agent's instruction set is stable enough that reconfiguration isn't an immediate pain point

Don't wait if:

  • You're currently evaluating real-time audio vendors for a greenfield voice agent project. `agent_context` and `reconfigure` should now be on your requirements checklist. Any vendor evaluation that doesn't include "can I update model instructions mid-session without reconnecting" is missing a dimension that matters for production resilience.

How to Get Started: Read the Streaming Prompting Guide

AssemblyAI has published a streaming prompting guide alongside this release. If you're integrating `agent_context` for the first time, start there before writing a single line of code. The guide covers:

  • How `agent_context` interacts with per-message system prompts
  • Best practices for structuring instructions that work reliably across reconfiguration events
  • Patterns for orchestrating `reconfigure` calls from backend event systems

A few practical recommendations from the architecture patterns this feature enables:

Treat `agent_context` as your session policy, not your per-turn instruction. Use it for stable, session-level behavioral guidelines. Handle turn-level nuance in your message handling logic.

Design your `reconfigure` triggers as backend events, not in-stream logic. Connect reconfiguration to your CRM, policy engine, or event bus rather than parsing audio transcripts mid-stream to decide when to update.

Version your `agent_context` strings. When you push updates to active sessions, log which context version each session received. This becomes essential for debugging behavioral drift and for compliance audits.

Test reconfiguration under load before you need it in production. The mechanism is designed to be low-latency, but your orchestration layer's ability to push updates reliably at scale depends on how you've implemented the control plane, not just AssemblyAI's infrastructure.

The Forward View

The release of `agent_context` with mid-session `reconfigure` support is a signal about where AssemblyAI is taking its streaming platform. The trajectory is toward a stateful agent runtime, not just a speech-to-text pipe with LLM features bolted on. For engineering teams, this creates a compounding advantage: every feature in this direction reduces the amount of custom orchestration code you need to maintain. Less bespoke glue code means faster iteration, lower operational surface area, and fewer failure modes in production. The voice agent space in 2026 is consolidating around the question of which infrastructure layer owns the session state and behavioral logic for live conversations. AssemblyAI is making a clear bet that the answer should be the audio streaming layer itself, tightly integrated with the model layer, addressable via clean SDK primitives. That's a bet worth watching, and if you're building voice agents today, it's a bet worth building on.

Get started with AssemblyAI

Want to start building with AssemblyAI? Here's a quickstart:

bash
import assemblyai as aai

transcriber = aai.SyncTranscriber()
transcript = transcriber.transcribe("call.wav")
print(transcript.text, transcript.confidence)

Ready to power your apps with Speech AI?

Join innovators leveraging AssemblyAI to extract insights, automate workflows, and deliver smarter voice-enabled experiences.

Read More Blog Posts

AssemblyAIAssemblyAI

Voice intelligence for modern development teams.

© 2026 AssemblyAI. All rights reserved.