---
title: "I gave my receptionist's job to 4 AI voice agents for a week. Here's what broke."
description: "Four AI voice agents (Synthflow, Retell AI, Lindy, Bland AI) ran my front desk for a week across appointment booking, after-hours overflow, FAQ deflection, and complaint triage. Two were genuinely useful for one workflow each. One was good in the demo and broken on Monday. One I switched off after 19 minutes. The accents tripped two of them up. The call holds tripped three. The transfers tripped all four. Here's the unvarnished week-by-week, plus the £80-a-month stack I'd actually leave running."
canonical: https://richardbatt.com/blog/ai-voice-agents-tested-4-uk-receptionist
date: 2026-05-05
author: Richard Batt
tags: [AI Voice Agents, Customer Service, Tool Tests, SMB AI]
type: blog_post
---

# I gave my receptionist's job to 4 AI voice agents for a week. Here's what broke.

_Four AI voice agents (Synthflow, Retell AI, Lindy, Bland AI) ran my front desk for a week across appointment booking, after-hours overflow, FAQ deflection, and complaint triage. Two were genuinely useful for one workflow each. One was good in the demo and broken on Monday. One I switched off after 19 minutes. The accents tripped two of them up. The call holds tripped three. The transfers tripped all four. Here's the unvarnished week-by-week, plus the £80-a-month stack I'd actually leave running._

**Richard Batt** — AI implementation specialist. 120+ projects across 15+ industries, serving SMBs (5-200 employees) worldwide from Middlesbrough, UK (working globally). Contact: richard@richardbatt.com · https://richardbatt.com

For one working week last month I let four artificial intelligence (AI) voice agents handle the inbound calls that normally land at my front desk. I run a small consultancy, so the volume was light: 60 to 90 calls a day across appointment booking, after-hours overflow, supplier FAQ deflection, and the odd complaint. The four agents were Synthflow, Retell AI, Lindy and Bland AI. Each got a full day on each workflow, with my actual receptionist on standby and a recorded transcript of every call.

Two of the four did one thing well. One was impressive in the demo and broken by Monday lunchtime. And one I switched off after 19 minutes. After 120+ AI projects across 15+ industries, this is the most concentrated voice-agent test I've run, and it changed the recommendation I give clients.

**The short version**

- Synthflow won appointment booking outright. Cost about £55 a month at our test volume.
- Retell AI was the best at FAQ deflection but stumbled on transfers to a real person.
- Lindy did fine on after-hours overflow if calls were short and one-step. The moment a caller wanted two things, it lost the thread.
- Bland AI is the polished demo product I had highest hopes for. It mishandled regional accents and tried to fake-empathise with a complaint about a broken laptop. The complainant was unimpressed.
- All four struggled with the same three things: holding a caller while looking up information, transferring a call mid-conversation, and a Teesside or Glaswegian accent under poor signal.

## How I tested

The setup was deliberately ordinary. Each agent got a real phone number routed to my main line. Each was briefed with the same one-page knowledge base: working hours, services list, the seven most common questions, the booking calendar, the complaint escalation path. I gave each agent a workflow per day, in the same order, so the volume mix was comparable.

The four workflows were the ones I see SMBs hand to AI voice tools first. Appointment booking covers the case where a caller wants to schedule a meeting and 90% of the slots in your diary are predictable. After-hours overflow catches the calls that come in once the office shuts and would otherwise hit voicemail. FAQ deflection answers the questions that don't need a person ("are you open Saturday?", "do you take card payments?", "where do I park?"). Complaint triage is the hardest of the four because the caller is annoyed and the AI's job is to capture the issue and route it without making things worse.

I logged five things per call: did the agent achieve the goal, did the caller sound annoyed by the end, did the transcript match what the caller actually said, did the agent escalate when it should, and how long the call took. Eight hours of audio in total. No vendor knew the calls were a test.

## Day one to day three: appointment booking

Synthflow was the standout. Out of 47 booking calls across two days, 41 ended with a meeting on my calendar at the right time, with the right person, on the right service. Three were escalated to a human (correctly, because the caller wanted something custom). Two were dropped (the agent didn't recognise the booking intent and routed to general FAQ). One was wrong (booked the meeting at 3pm UK when the caller said 3pm New York).

Retell AI handled appointment booking but with more friction. The model was slower to confirm the date back to the caller, which led to two callers re-stating the date three times before the agent locked it in. The bookings landed correctly when the call finished, but the experience was clunky. If the test had been a paying customer, I'd expect a measurable drop-off rate.

Lindy and Bland AI both struggled here, for opposite reasons. Lindy's calendar integration was shallow, so the agent could read availability but couldn't write a booking back to the calendar without an extra confirmation email. Two callers got annoyed and hung up at that step. Bland AI booked confidently and quickly, but its confidence was occasionally wrong: it once booked a 30-minute slot that didn't exist because two existing meetings overlapped in a way the calendar feed didn't expose.

Verdict for appointment booking: Synthflow if you want it working tomorrow, Retell AI if you have time to refine the prompts, and the other two if you don't mind your calendar lying.

So the morning briefing I gave my receptionist on day four was simple. Synthflow had the booking workflow.

## Day four: after-hours overflow

The pattern flipped here. Lindy was the best of the four at after-hours overflow because the workflow is shorter. Most after-hours calls in my business are "I missed you, please call me back," and the agent's job is to capture the name, the number, the rough reason, and a callback window. Lindy did this in 38 of 41 attempts. The other three each had a different failure mode.

Synthflow over-engineered the call, asking three follow-up questions when one would have done. After-hours callers are tired and just want to leave a message. Two of them hung up.

Retell AI got the message right but mispronounced names that included regional spellings (Siobhan, Aoife, two specific Teesside surnames). The transcripts captured the sound, not the spelling, which meant the callback list was unworkable until a human cleaned it up.

Bland AI tried to handle the after-hours call as if it were a daytime conversation, including offering to "see if anyone is available" when there demonstrably wasn't. One caller asked the agent if it was a real person. The agent said yes. That call ended my use of Bland AI for that workflow.

## Day five: FAQ deflection

Retell AI's day. FAQ deflection is the workflow that rewards a tight knowledge base and crisp prompt design, and Retell AI's structure made the small adjustments easier than the others. Out of 64 FAQ calls, 58 were resolved without escalation. The six escalations were genuinely outside the FAQ scope (custom pricing, an unusual integration question, a complaint that snuck into the FAQ line).

Synthflow was second. It deflected 47 of 64 but gave overly long answers to simple questions. A caller asking "are you open on Saturdays?" got a 22-second response when "no, Monday to Friday only" would have done. The deflection rate was fine; the call duration was higher than it needed to be, and call duration is a cost line on an FAQ workflow.

Lindy and Bland AI both had moments where they answered confidently and wrongly. Lindy told one caller that we offered a service we don't, because the agent was trained on a marketing page that included an aspirational list. Bland AI told another caller that we accepted a payment method we don't. Both errors are small and avoidable, but both required a refund call later in the week.

## Day six: complaint triage

This is the workflow where I'd hoped one of the four would be unambiguously good. None of them were. The best of the four was Synthflow because its escalation logic was the most conservative: any phrase the model classified as a complaint triggered a transfer to a human, no questions asked. That meant some non-complaints (the caller using the word "frustrated" while describing a different problem) were over-escalated, but no genuine complaint was mishandled.

Retell AI tried to triage in line. It collected the issue, the caller's number, the order or invoice reference, and committed to a callback within four hours. In three out of nine complaint calls the agent got the order reference wrong, which would have made the callback worse than no callback at all.

Lindy didn't really triage. It captured the issue and offered to call back, then routed to a generic inbox without flagging urgency. The complaint about a missed delivery sat in the inbox alongside a sales enquiry until the next morning. That's a £200 cost on what was a £40 product.

Bland AI is the one I switched off mid-workflow. The first complaint call of the day involved a customer whose laptop had stopped working the day before a presentation. The agent responded with a sympathetic-sounding sentence ("I completely understand how frustrating that must be") that, on the transcript, read as fake. The customer said as much. Eighteen minutes later the customer asked if they were speaking to a person. I switched the workflow back to my receptionist and called the customer myself.

## What broke for all four

Three failure patterns ran across every agent.

The first was holding the caller. When a question required the agent to look up information ("can you check if my order shipped?"), three of the four were poor at maintaining the line. Synthflow handled holds reasonably well by sending the caller to gentle hold music, but the lookup itself was slow because the integration had to round-trip through a webhook. The other three either filled the silence with awkward narration ("I'm just checking that for you now, one moment please, still checking") or held in dead air. Both feel weird on a phone call.

The second was transfers. None of the four agents were good at warm transfers. Cold transfers (here is the next number, please call them) all four could do. Warm transfers (let me put you through to Sarah, who can help) needed the agent to summarise the call to Sarah, which all four faked or skipped. Three of nine warm transfers landed Sarah on the line with a confused customer who'd just explained the problem to the AI.

The third was accent under signal. I am from Teesside, my receptionist is from Newcastle, and we have a customer base that includes Glasgow, Belfast and the Welsh valleys. Bland AI mishandled accents under poor mobile signal more than once. Lindy was second-worst. Synthflow and Retell AI handled the accents fine on a clear line, but degraded fast when the caller was on motorway hands-free. If you have a UK customer base, factor a 5 to 10% accent-related failure rate into your pilot results, and weight it more heavily if your callers are in vehicles or warehouses.

## The £80 stack I'd actually leave running

If I were standing up the voice-agent layer for a 10 to 30 person UK business this week, the stack I'd actually leave running across these four workflows is two tools, not one.

Synthflow handles appointment booking and complaint triage. The conservative escalation logic on complaints is the right tradeoff. The cost at our volume was about £55 a month plus the call time on a UK number.

Retell AI handles FAQ deflection. The prompt-tuning is the strength here, and an FAQ workflow rewards a tight prompt over a slick demo.

I'd leave a human on after-hours overflow until volume justified an agent there too. Below 20 calls a night, the AI value is small and the cost of a wrong handover is real. Above 20 calls a night, Lindy is the candidate to retest with the after-hours prompt rewritten to assume the caller wants out fast.

Total monthly cost for the two-tool stack at our volume: about £80, plus the per-minute call costs which run another £30 to £60 a month. Cheaper than a part-time receptionist, more reliable than a single-tool stack, and the only one of the configurations I tested that didn't make a customer angry inside a week.

## Frequently asked questions

### Will an AI voice agent fully replace a receptionist?

Not in 2026 for most UK SMBs. The agents are good enough to handle a single workflow well, and a stack of two is good enough to handle three or four workflows acceptably. Your receptionist still owns the calls that don't fit a script. Examples include the regular customer who wants to chat, the supplier with a special request, and the call from a client whose tone you need to read. AI doesn't read tone reliably yet. Plan for AI to handle 60 to 80% of inbound volume on the workflows you script for it, and plan for a human to own everything else.

### How much does an AI voice agent cost in the UK?

The four tools I tested all sit in a similar price band: £30 to £150 a month at light SMB volume, plus per-minute call charges and a small per-number cost. A two-tool stack runs £80 to £200 a month. A typical UK part-time receptionist costs £900 to £1,500 a month. The economics work below 200 calls a day for almost any UK SMB, but only if you've matched the right tool to the right workflow.

### What's the biggest mistake SMBs make when piloting voice AI?

Picking one tool and asking it to do every workflow. None of the four tools I tested was best across all four workflows, and most of the bad customer experiences I logged came from a tool being asked to do something it wasn't shaped for. Pick one workflow per tool, get each tool good at its workflow, and only widen the scope once a workflow is reliable.

## What I'd do this Monday morning

If you have a front desk that's overloaded and you're considering an AI voice agent, run a five-day version of the test above. Pick one workflow, one tool, and one quiet line to route to it. Listen to every call for the first 20 calls. The AI either gets the goal right or it doesn't, and you'll know inside two days, not two months.

The AI Ops Vault has the voice-agent prompts I use with clients, including the conservative complaint-triage prompt that made Synthflow safe to leave on. https://richardbatt.co.uk/vault

If you'd rather start with the 10-minute version, download the AI Quick-Wins Checklist. It includes the five-day pilot template I used for the test in this post. https://richardbatt.co.uk/quick-wins

The four agents I tested were impressive in places and broken in others, sometimes inside the same hour. So the lesson I'm taking out of the week is the one I keep relearning. Pick the workflow first, pick the tool second, and run the customer-facing pilot before you sign anything longer than month-to-month.

---

## More about Richard Batt

Richard Batt is an AI implementation specialist who helps businesses deploy working AI automation in days, not months. 120+ projects across 15+ industries.

### Key pages

- [Home](https://richardbatt.com/)
- [About Richard](https://richardbatt.com/about)
- [Blog](https://richardbatt.com/blog)
- [Contact](https://richardbatt.com/contact)
- [Subscribe](https://richardbatt.com/subscribe)

### Contact

- Email: richard@richardbatt.com
- Location: Middlesbrough, UK (working globally)
- Website: https://richardbatt.com