TL;DR
We rebuilt our voice agent from the ground up. Voice Agent v2 ships today with first-token latency under 200 milliseconds (down from 600ms), native multi-language switching across 14 languages, and 40% lower per-minute cost. Existing customers will be migrated automatically over the next two weeks.
If you've ever felt voice AI was "almost there", v2 is the version that crosses the line.
Why a rewrite
Voice Agent v1 was good. Customers were happy. But we kept hitting the same three limits: latency was the floor at 500-700ms, language switching required reconfiguration, and per-minute cost couldn't go below $0.08.
Each of those limits had a different root cause. Latency was a chain-of-services problem. Multi-language was a model-selection problem. Cost was an architecture problem. Fixing them required rewriting the whole stack, which is what we did over the past 8 months.
Latency: 600ms → 180ms
The old pipeline ran ASR → LLM → TTS in series, each waiting for the previous to finish. The new pipeline streams partials between every stage. ASR partials trigger LLM tokens before the user finishes speaking. LLM tokens stream into TTS while still being generated.
The result: median first-token-out latency of 180ms across our test suite. P95 under 280ms. This is the threshold where voice agents stop feeling like AI and start feeling like a person.
14 languages, switching mid-sentence
v1 required configuration per language. Customers serving multilingual markets had to maintain separate agents. v2 detects language per utterance, and even per word in code-switching scenarios.
Supported on launch: English, Ukrainian, Russian, Polish, Czech, Slovak, Romanian, Hungarian, German, French, Spanish, Italian, Portuguese, Dutch. More coming based on customer demand.
40% lower cost per minute
We replaced the heaviest model in the chain with a custom-trained one that's 4× smaller and matches quality on our voice benchmark. We also moved from per-token pricing to bulk inference, which works because we can batch across customers.
Pricing for end customers drops accordingly: voice minutes on the Launch plan go from $0.08 to $0.05. Existing contracts will be honored at the better rate without action.
What's next
Voice Agent v2 is live for all customers as of today. Migration is automatic and zero-config, just refresh your dashboard.
Coming in Q3: voice cloning for branded experiences, emotion-aware prosody, and a self-service voice latency analyzer.