About the role

Our agent orchestration layer routes millions of conversations a month across LLMs, voice models and custom skills. You'll own its performance, reliability and cost.

We optimize for sub-300ms response times in voice and sub-2s in chat. We pick models per request based on quality vs cost tradeoffs. We A/B test prompts in production. You'll do all of this and more.

This is a hands-on engineering role with significant research components. You'll write code, read papers, run experiments, and ship to production.

What you'll do

Own the model serving and orchestration layer
Optimize for latency, cost and quality across providers (OpenAI, Anthropic, in-house)
Build evaluation harnesses and A/B testing infrastructure
Research and ship improvements to RAG, tool-use and multi-step reasoning
Mentor other engineers on ML best practices

What we're looking for

5+ years ML engineering, 2+ years production LLM systems
Deep Python; experience with vLLM, Triton or similar inference servers
Strong systems thinking, you've debugged production latency or cost issues
Comfortable reading research papers and prototyping their ideas
Bonus: distributed systems experience

Perks & benefits

Remote-firstEquity packageHealth insurance$2k learning budgetCompute budget for experiments

Senior ML Engineer

About the role

What you'll do

What we're looking for

Perks & benefits

Apply for Senior ML Engineer

Save your agent to continue