All posts
Engineering 12 min read

How we cut voice latency under 300ms

An engineering deep-dive on streaming pipelines, model selection per turn, and the production tricks that took us from 600ms to 180ms.

Yaroslav Demir
Principal Engineer
Apr 18, 2026

The latency problem

Voice AI has a brutal latency budget. Anything over 500ms feels like a delay. Over 1 second feels broken. Our v1 pipeline averaged 600ms, fine for product, not great for the bar we wanted to hit.

Latency in voice pipelines is dominated by the chain of services: ASR → LLM → TTS, plus network round trips. Each stage waits for the previous to finish. With 200ms ASR + 250ms LLM + 200ms TTS, you're already at 650ms.

Streaming partials between every stage

The biggest win came from never waiting for a stage to finish. ASR streams partial transcripts as the user speaks. The LLM gets fed partials and starts generating. TTS streams audio chunks back before the LLM is done.

This has tradeoffs. ASR partials are noisy, they revise as more audio comes in. The LLM has to handle that gracefully. We added a thin layer that buffers the last 150ms of partials before flushing to LLM, which absorbs most revisions.

Streaming pipeline timing diagram
Streaming pipeline timing diagram

Picking the right model per turn

Not every turn needs a 400B-param model. Most don't. We classify incoming turns into 4 buckets, small-talk, factual lookup, reasoning, complex multi-step, and route to a model sized for each.

Small-talk goes to a 7B model with 80ms inference. Reasoning goes to a 70B model. Complex multi-step gets the big stuff. Our classifier itself is 200M params and runs in under 10ms.

End result

Median first-token-out latency: 180ms. P95: 280ms. P99: 420ms. The slowest tier of conversations are still under the perceptual threshold for most users.

Cost dropped too, we use the small model 70% of the time, the medium model 25%, and the big one only 5%.

#voice#performance#engineering
Yaroslav Demir
Principal Engineer

Owns platform reliability. 10+ years building high-throughput systems. Will defend Go in any thread.

Try MyChatBot for free

Set up your first AI agent in 10 minutes. No credit card required.

Start free trial