VoiceAgentLatency

StatusTesting

StackVapi, Claude API, Vercel Edge

Overview

Benchmarking sub-200ms response times for real-time voice agents — the threshold where callers stop noticing they are talking to AI.

How it was built

Built on Vapi for telephony orchestration with Claude as the reasoning engine. The agent runs on Vercel Edge Functions to minimise cold start latency. We stream token-by-token responses back to Vapi so the caller hears the first word within 200ms, not after the full response generates.

VapiClaude APIVercel Edge

What's being tested

Currently testing three variables: cold start time across Vercel edge regions (London, Frankfurt, US-East), streaming token delivery speed with different Claude model sizes (Haiku for speed vs Sonnet for depth), and interruption handling — can the agent stop mid-sentence when the caller speaks? Current best: 180ms first-token in London region with Haiku, 340ms with Sonnet.

How BRVO uses this

This research directly feeds into the Voice Agent integration offered in BRVO's Ignite package. Every voice agent BRVO ships to clients uses the optimised configuration discovered through this testing.

Use cases

Restaurant reservations

A restaurant that misses 30% of calls during service. The voice agent answers, books tables, handles dietary questions, and sends confirmation texts — all under 200ms response time so callers don't hang up.

Plumber/tradesperson booking

A sole trader who loses leads while on a job. The agent qualifies the enquiry, checks calendar availability, and books the appointment. The tradesperson gets a notification with all the details.

Medical practice triage

A GP surgery handling 200+ calls daily. The agent triages urgency, books appropriate appointments, and handles prescription repeat requests — freeing reception staff for in-person patients.

Want this for your business?

Start a sprint

Back toLab←