Why voice is the hardest surface
Text forgives a slow, wordy answer. Voice does not. On a call, every extra token is dead air, every wrong turn is awkward silence, and there is no scrolling back. A flat prompt that works in chat falls apart on the phone. Voice architecture has to optimize for latency and turn-taking, not just correctness.
The four layers
MyChatBot agents stack in layers. System sets the voice persona, pace and guardrails. Role narrows the job, reminders, qualification, outbound. Memory is the Knowledge Base plus CRM history, so the agent knows the offer and the caller. Tools is where it acts, booking, CRM writes, transfers. On the Calls SDK these layers are tuned to keep responses short and fast.
Scripts vs free speech
The art is balancing a script with real conversation. Too scripted and it sounds like a robot; too open and it rambles and burns latency. The role layer defines the call's spine, the must-hit beats, while Agentic Search fills in specifics on demand. The caller can interrupt with barge-in and the agent adapts without losing its place.
What silently burns credits
On voice, the budget killers are long-winded replies, re-fetching context mid-call, and verbose system text on every turn, and they hurt the experience too, as latency. Fix it with tight system layers, narrow retrieval, and CRM memory so the caller is recognized instantly. On voice, lean architecture is both cheaper and better.
Designing the live handoff
A voice handoff has to be seamless, a cold transfer kills trust. Hand-off Control transfers the live call or schedules a callback the moment it hits a threshold or stop-phrase, with Flight Control alerting your team in Telegram. Then ship via the Configuration Wizard, which battle-tests the script and versions every change, so you tune pace, scripts and transfers without ever starting over.