Voice agents that actually work: a production playbook
After shipping a dozen voice agents into production, we've converged on a build pattern that handles the edge cases demos never show.
Voice agents demo well. The first 30 seconds of any voice agent conversation are magic. The next 30 seconds are where the wheels come off, because that's when real humans ask questions the demo script didn't cover.
Every voice agent we've shipped into production has survived contact with reality by handling four things the demos glossed over: interruptions, edge-case intents, calendar collisions, and handoff.
Interruptions. Humans talk over. They change their mind mid-sentence. They cough and restart. A voice agent that only handles clean turns will sound robotic the moment someone interrupts it. We use a short endpoint-detection window (300–500ms) plus an explicit 'interruption-allowed' mode during specific moments — answers, confirmations, and any time the agent is asking a yes/no.
Edge-case intents. Every voice agent we've built ends up with a long tail of intents we didn't anticipate. 'What's your address?' 'Do you take insurance?' 'Is my appointment still on?' 'Can I talk to a human?' The solution isn't to handle every intent — it's to handle the request to switch context gracefully. Every agent gets a 'handoff' intent that routes to voicemail, a human, or an SMS callback, depending on time of day.
Calendar collisions. The easiest bug to ship in a voice agent: it books an appointment at a time the calendar thinks is free, but the human operator doesn't. Always write to the live calendar, not a cache. Always double-confirm the booking after the API round-trip succeeds. Always include a short 'would you like a reminder?' step so the human has another chance to correct.
Handoff. Every agent conversation that ends in ambiguity needs a clean path to a human. The best voice agents aren't the ones that handle 100% of calls — they're the ones that handle 70% cleanly and hand off the other 30% with full context. Our builds ship with a Slack or email summary of every handed-off call, so the human picking up the thread doesn't start cold.
The stack we've converged on: Telnyx for telephony, OpenAI or Anthropic for the language model, a small custom router for intent classification, and a direct calendar integration (never a Zap). Total build time for a production voice agent: 10–14 days.
The measurement that matters most post-launch: handoff quality, not handoff rate. A voice agent that hands off 40% of calls with rich context is better than one that handles 80% poorly.