All posts
Product 8 min read

Voice Agent v2: 3× faster, 40% cheaper, in 14 languages

A complete rewrite of our voice stack. Sub-200ms first-token latency, native multi-language switching, and dramatically lower per-minute cost.

Andriy Korolenko
Voice AI Lead
Apr 22, 2026

TL;DR

We rebuilt our voice agent from the ground up. Voice Agent v2 ships today with first-token latency under 200 milliseconds (down from 600ms), native multi-language switching across 14 languages, and 40% lower per-minute cost. Existing customers will be migrated automatically over the next two weeks.

If you've ever felt voice AI was "almost there", v2 is the version that crosses the line.

Why a rewrite

Voice Agent v1 was good. Customers were happy. But we kept hitting the same three limits: latency was the floor at 500-700ms, language switching required reconfiguration, and per-minute cost couldn't go below $0.08.

Each of those limits had a different root cause. Latency was a chain-of-services problem. Multi-language was a model-selection problem. Cost was an architecture problem. Fixing them required rewriting the whole stack, which is what we did over the past 8 months.

Latency: 600ms → 180ms

The old pipeline ran ASR → LLM → TTS in series, each waiting for the previous to finish. The new pipeline streams partials between every stage. ASR partials trigger LLM tokens before the user finishes speaking. LLM tokens stream into TTS while still being generated.

Voice pipeline architecture diagram showing streaming partials between ASR, LLM, TTS
Voice pipeline architecture diagram showing streaming partials between ASR, LLM, TTS

The result: median first-token-out latency of 180ms across our test suite. P95 under 280ms. This is the threshold where voice agents stop feeling like AI and start feeling like a person.

14 languages, switching mid-sentence

v1 required configuration per language. Customers serving multilingual markets had to maintain separate agents. v2 detects language per utterance, and even per word in code-switching scenarios.

Supported on launch: English, Ukrainian, Russian, Polish, Czech, Slovak, Romanian, Hungarian, German, French, Spanish, Italian, Portuguese, Dutch. More coming based on customer demand.

40% lower cost per minute

We replaced the heaviest model in the chain with a custom-trained one that's 4× smaller and matches quality on our voice benchmark. We also moved from per-token pricing to bulk inference, which works because we can batch across customers.

Pricing for end customers drops accordingly: voice minutes on the Launch plan go from $0.08 to $0.05. Existing contracts will be honored at the better rate without action.

What's next

Voice Agent v2 is live for all customers as of today. Migration is automatic and zero-config, just refresh your dashboard.

Coming in Q3: voice cloning for branded experiences, emotion-aware prosody, and a self-service voice latency analyzer.

#voice#release#performance
Andriy Korolenko
Voice AI Lead

Owns the voice infrastructure. PhD in audio ML, 8 years shipping production speech systems. Believes voice AI is finally ready for prime time.

Try MyChatBot for free

Set up your first AI agent in 10 minutes. No credit card required.

Start free trial