I Replaced My $3,200/Month VA With an AI Agent — Here's What Actually Happened
I Replaced My $3,200/Month VA With an AI Agent — Here's What Actually Happened
Last Tuesday at 3 AM, I woke up in a cold sweat because my customer support agent had been hallucinating refund policies to 47 customers. Forty-seven.
I know the exact number because the OpenAI Agents SDK logs every single interaction with terrifying precision. Down to the millisecond. It's both beautiful and horrifying.
That's the thing about building production AI agents nobody talks about on Twitter — when they break, they break at scale. Not one customer getting dodgy info. Forty-seven. All at once. While you're sleeping.
I've spent the last four months rebuilding ReplyPilot's core automation engine using the new OpenAI Agents SDK (v0.1.6, if you're curious). Before this, I was burning £2,500/month on a virtual assistant team in the Philippines to handle customer onboarding and basic support tickets. They were great people — Maria especially, she'd been with me since the $2k MRR days. But scaling humans linearly with revenue isn't exactly the indie hacker dream Pieter Levels preaches.
Here's the unfiltered journey of shipping an agent that actually works in production. Not in a demo video. Not in a Twitter thread with 10k likes. Actually works.
The "Before" State (October 2024)
My stack was a Frankenstein monster. I'm not being dramatic:
- Custom Python scripts calling the OpenAI API directly (written at 2 AM over three different weekends)
- A janky state machine I built myself that I was weirdly proud of but also terrified to touch
- Redis for conversation memory that kept filling up because I forgot to set TTLs. Twice.
- 47% of conversations needed human handoff anyway
Metrics that hurt:
- Average response time: 14 minutes
- Customer satisfaction: 3.8/5
- Monthly cost: £2,500 fixed + £320 API credits
- My sanity: negative
I knew I needed to rebuild. Kept putting it off because "it works, mostly." Classic indie hacker trap. You know the one.
Why I Bet on the Agents SDK (Instead of LangChain)
I tried LangChain last year. Actually, wait—I should clarify that I tried LangChain and LangGraph. Two different attempts. Harrison Chase is brilliant, genuinely, but the abstraction layers made me feel like I was debugging someone else's magic tricks. When something broke, I had to understand LangChain's internals, not just my own logic. That's... not great at 11 PM on a Saturday.
The OpenAI Agents SDK clicked for me because it's thin. Almost too thin, honestly. It gives you:
- Agents with defined instructions and tools
- Handoffs between specialised agents
- Guardrails that run on every input/output
- Tracing that actually makes sense
No black box orchestration layers. Just primitives I can compose. Or break. Probably both.
Here's what my agent architecture looks like now:
Triage Agent (classifies intent)
├── Onboarding Agent (product setup, tutorials)
├── Support Agent (troubleshooting, bugs)
│ └── Refund Agent (only triggered for cancellation requests)
└── Sales Agent (pricing questions, upgrades)
Each agent has its own system prompt, tool set, and guardrails. The handoff mechanism means the Onboarding Agent never accidentally promises a refund, and the Refund Agent never tries to upsell someone who's angry. That last one was a real problem with the old system, by the way. Nothing like getting a "Have you considered our annual plan?" email right after you've demanded your money back.
The Build Timeline (With Real Stumbles)
Week 1-2: Proof of Concept
I migrated the simplest flow first — onboarding new users. The SDK's @tool decorator made it stupid easy to connect my existing database functions. Almost too easy. I kept waiting for the catch.
First win: The agent correctly walked a user through connecting their Gmail account in 12 steps without getting lost. My old system would've given up at step 4. I literally screenshotted the conversation and sent it to my founder friends at 1 AM.
First failure: I didn't set max_turns and one agent got stuck in a loop asking the customer "Can you clarify that?" for 23 minutes. Twenty-three. The customer thought it was performance art. I wish I was joking.
Week 3-4: Guardrails Save My Arse
This is where the SDK earned its keep. I added output guardrails that check every agent response before it reaches the customer:
- No promising features that don't exist (checks against a features.json file I update manually)
- No hallucinated pricing (validates against Stripe API in real-time)
- No toxic language (standard content filter, nothing fancy)
- No PII leakage (regex + entity recognition, pretty basic stuff)
The first day I turned these on, the guardrails blocked 12 responses. Twelve. Times. My agent would've said something wrong. One of them was about to tell a customer we have an "enterprise plan with SSO" — we don't. We're a $10k MRR bootstrapped SaaS. We have Google login and prayers.
Week 5-6: The Tracing Revelation
The SDK's tracing dashboard exposed something I never would've found otherwise. It's basically a tree of agent calls with latency for each node. Beautiful. Depressing.
My Support Agent was calling the knowledge base search tool 3-4 times per query because my initial prompt said "search thoroughly." I thought I was being helpful. I was being expensive. It was spending $0.08 per conversation just on redundant searches. Across 2,000 conversations/month, that's $160 in wasted API calls.
Changed the prompt to "search once with the most specific query possible." Cost dropped to $0.02 per conversation.
Small prompt tweaks compound hard at scale. I think about this constantly now.
The Numbers After 60 Days in Production
Current metrics (December 2024):
| Metric | Before | After | Change |
|---|
| Monthly cost | £2,820 | £535 | -81% |
|---|
| Response time | 14 min | 22 sec | -98% |
|---|
| CSAT score | 3.8/5 | 4.4/5 | +16% |
|---|
| Human handoff rate | 47% | 12% | -74% |
|---|
| Refund requests | 23/mo | 18/mo | -22% |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.