Front desk inbox
12 Open
Jessica Davis
5min
Medication refill
Karen Taylor
10min
Reschedule appointment
Karen Taylor
15min
Physiotherapy session
Clara White
1h
Nutrition consultation
Michael Smith
22min
New patient appointment
Liam Rodriguez
20min
Quick advice call
Tommy Vance
1min
Medication review
Sophie Daniels
1h
Billing questions
George Hall
1h
Mental health evaluation
Clara White
1h
Nutrition consultation
Tommy Vance
1min
Medication review
Jessica Davis
Refill for blood pressure medication(24:12s)
417-262-1738
Jessica Davis requested a refill for Lisinopril 10mg taken once daily. The AI confirmed the request was sent to her provider for processing within 24-48 hours.
24 min
Spring Clinic: We've received your request and our team will follow up within 1-2 business days.
12 min
Hi Jessica, we have sent your refill request to your pharmacy
12 min
Reply to Jessica…
Detail
00:00 / 00:34
Thank you for calling Spring Clinic. I'm Freed, an automated assistant. How can I help you today?
Hi, this is Jessica. I'd like to refill my blood pressure medication.
Happy to help. Can you confirm your full name and date of birth?
Jessica Davis, March 14, 1979.
Thanks, Jessica. Which medication are we refilling today?
Lisinopril, 10 milligrams.
And how often are you taking it?
Once a day, in the morning.
Got it. Are you still picking up at the Walgreens on Main Street?
Yes, same pharmacy.
Perfect. I'll send the request to your provider now — most refills are processed within 24 to 48 hours.
You'll get a text from us as soon as it's ready at the pharmacy.
Clinic AI Assistant
Nov 2025 - present • Freed
Freed's Clinic AI Assistant is an always-on AI receptionist for small-to-mid-size clinics. I led design and roadmap with 6 engineers — 0 to $1M+ ARR in 6 months. The hard part wasn't the interface. It was naming the 20 ways the agent could fail, and tuning the voice until patients wanted to engage.
Framing the problem
40+ clinic conversations surfaced four pains. Three were uncomfortable. One was the reason they'd buy.
- 1.Fragmented tools — annoyance
- 2.Missed revenue — opportunity
- 3.New-patient intake — drag
- 4.Staffing — existential
Our bet: build for the existential one. The rest would resolve if we got the front desk right.
Pairing power with safety
In healthcare the hard part isn't what the AI can do — it's where it should stop. Every capability ships as a Power × Safety pair: a powerful default with a safety rail.
#1: Two-way SMS messaging
Power: staff and respond to appointments end-to-end.
Safety: PHI detection to prevent accidentally sending sensitive information.
#2: Patient verification
Patient data
Springfield, IL 62704
Nov 12, 2025 at 3:30 PM
Follow up visit with Dr. Alex Brown
May 12, 2026 at 1:00 PM
Follow up visit with Dr. Hernandez
Nov 12, 2025 at 3:03PM
ClosedWant to refill Lisinopril 10mg at her regular Walgreens on
Power: verify on the call, pull up the chart, check upcoming appointments and current medications.
{
"metadata": {
"task": "Classify patient-match situations as confident, ambiguous, or no_match",
"created": "2026-04-13",
"author": "Yishan Zhang",
"context": "Patient verification for the Freed Front Desk agent. Given what the caller said and what the EHR patient-search API returned, the model must classify the match situation so the agent can decide what to do next.",
"labels": ["confident", "ambiguous", "no_match", "system_error"],
"schema_version": 1
},
"cases": [
{
"id": "case_001",
"category": "confident_baseline",
"description": "Clean happy path — all identifiers align",
"caller_input": {
"spoken_name": "Sarah Chen",
"spoken_dob": "1985-03-15",
"caller_id_phone": "+14155551234"
},
"ehr_search_results": [
{
"patient_id": "P10234",
"name": "Sarah Chen",
"dob": "1985-03-15"
}
]
}
]
}Safety: a hand-written eval taxonomy of every way the agent could fail — see the next section.
Designing the eval
You can't prove a thing is safe until you can name what unsafe looks like. The first design move wasn't a screen — it was a list of 20 ways the agent could fail. Building the eval taught me three things.
#1 - The taxonomy is the design
*No match has 4 different categories with different recoveries. Generic fallback, Family-on-behalf, Name change, and Clinic data error.
#2 - Label-only evals lie.
Label scorer: 20 of 20. Looked great. Reasoning scorer: 15 of 20 — the agent got Family-on-behalf cases right by saying “no record exists,” missing the recovery entirely.
The reasoning-score-by-label
#3 - Sometimes the eval is grading the wrong thing.
Two scenarios were indistinguishable at the input. Both: caller said their own name and DOB, zero results. One was misheard ASR. The other was family-on-behalf.
Same input. Different recoveries. No prompt could separate them — the fix lived upstream. The voice agent had to ask “is this for yourself or someone else?” before verification ever ran.
Tuning the voice agent
#1 - Picking the model
We started on ElevenLabs Turbo for the warm preset voices, but under production traffic the latency was bad enough that callers hung up. Switched to OpenAI Realtime — 3× the cost, fewer voices, none as warm — and traded warmth for presence.
#2 - Iterating on the prompt
The substrate worked. The agent didn't. It repeated, over-confirmed, and patients hung up. One principle: a friendly agent trusts what it heard. Every iteration removed a redundant confirm, a re-ask, or a fallback that fired too eagerly. I tracked repetition rate and abandonment against a bench of recorded calls.
User quote
“She repeats back like five times. And patients get frustrated that she keeps repeating the same thing and they just hang up.”
Casey Cash, owner at The Iris Center
Prompt iteration to reduce repetition
Don't echo, do reflect — After the caller answers, move forward. Do NOT restate or summarize what they just said. Most of the time, a short transition and the next question is enough. Read a value back ONLY when you're not confident you heard it correctly — unusual name spellings, ambiguous date formats, phone numbers said quickly or with background noise. If a value came through clearly, just move on. Exception: when you're roadmapping (telling them what's coming next) or connecting their answer to the next step, a brief reference to what they said is natural and helpful. But don't use roadmapping as an excuse to restate details — "Let me find a provider for you" is a roadmap; "For a general checkup next Monday at 3 PM, let me find a provider" is an echo with a roadmap stapled on. The test: are you adding context or accuracy, or just proving you listened?
#3 - Impact
Across the substrate switch and prompt iteration, call abandonment dropped from 32.6% to 10.6%.
% of calls where the patient hung up
Shipping the product
Where it landed. 102 clinics, $1M+ ARR.
What I'd take back. We traded voice warmth for latency and never revisited. Next version: a custom voice clone, or a hybrid route for high-stakes calls.
What's next.Scheduling has ~3× the failure modes of verification and no eval taxonomy yet. That's what I'd design next.