Benchmarking sub-200ms response times for real-time voice agents — the threshold where callers stop noticing they are talking to AI.
Built on Vapi for telephony orchestration with Claude as the reasoning engine. The agent runs on Vercel Edge Functions to minimise cold start latency. We stream token-by-token responses back to Vapi so the caller hears the first word within 200ms, not after the full response generates.
Currently testing three variables: cold start time across Vercel edge regions (London, Frankfurt, US-East), streaming token delivery speed with different Claude model sizes (Haiku for speed vs Sonnet for depth), and interruption handling — can the agent stop mid-sentence when the caller speaks? Current best: 180ms first-token in London region with Haiku, 340ms with Sonnet.
This research directly feeds into the Voice Agent integration offered in BRVO's Ignite package. Every voice agent BRVO ships to clients uses the optimised configuration discovered through this testing.
A restaurant that misses 30% of calls during service. The voice agent answers, books tables, handles dietary questions, and sends confirmation texts — all under 200ms response time so callers don't hang up.
A sole trader who loses leads while on a job. The agent qualifies the enquiry, checks calendar availability, and books the appointment. The tradesperson gets a notification with all the details.
A GP surgery handling 200+ calls daily. The agent triages urgency, books appropriate appointments, and handles prescription repeat requests — freeing reception staff for in-person patients.