AI Agents in Production: Bridging the Gap Between Demos and Reality
We see impressive AI agent demos every day. Tasks completed automatically, zero human intervention, seemingly ready to replace entire teams. But in real enterprise environments, the story is far more nuanced.
Jolie Ni, an engineer specializing in enterprise AI adoption, recently shared her hands-on experience building AI agents for companies and revealed the challenges that flashy demos never talk about.
AI Agents Are More Than Chatbots
AI agents have evolved through several stages:
- Simple Q&A: Asking an LLM basic questions and getting answers
- Knowledge retrieval: Attaching a knowledge base so the AI can look up relevant information
- Action execution: Letting AI perform tasks like sending emails or calling APIs
- Stateful agents: Giving AI its own memory that updates over time, so it can track progress and make decisions based on context
Today's AI agents combine all of these. They interpret what you want, figure out the next step, take action, and learn from the results. They keep iterating until the goal is met.
Two Common Approaches
Approach 1: Rule-based orchestration
- Routes are predefined (e.g., refund, cancel, escalate)
- AI classifies the user's intent and follows a fixed path
- Very predictable, very low risk
- Best for: workflows that already exist and have clear steps
Approach 2: LLM-based orchestration
- AI decides the next step on its own from a set of available tools
- More flexible, handles unexpected situations better
- Less predictable, results can vary
- Best for: complex scenarios with multiple parties and changing requirements
Real-World Case Studies
Case 1: Sales outreach agent (rule-based)
Jolie built this for a company that wanted to automate prospecting. The original human workflow was already well-defined: research leads, collect info, write personalized emails, track replies, log everything.
Challenges that came up in production:
- Web search results were inconsistent across different prospects
- Email quality varied because available information varied
- AI could generate millions of emails per day, but inboxes have daily sending limits
- Had to decide where humans should approve before sending
Case 2: Relocation support agent (LLM-based)
This agent helped coordinate international moves: visas, shipping, short-term housing, bank accounts, and more. It had to communicate with customers, shipping vendors, and visa providers across different channels.
Challenges that came up in production:
- Handling sensitive personal data (passports, social security numbers) required strict memory isolation per customer
- The agent needed its own identity and permissions so every action could be tracked
- Multiple parties had dependencies on each other (e.g., don't start the move before the visa is approved)
- Conversations happened across email, web portals, and other platforms simultaneously
How to Measure Agent Performance
Jolie stressed that you need to define success before you start building. Her recommendations:
- Break evaluation into yes/no questions: "Does this email contain the correct name?" is much more reliable than "Is this email good enough?"
- Track human approval rates: If humans are rejecting a lot of the agent's outputs, something is wrong
- Build a golden dataset: Use real historical cases, including edge cases and failures, as your test suite
- Run regression tests regularly: Every time you change a prompt or upgrade a model, re-run your tests to make sure nothing broke
Her team saw task completion climb from around 20% to much higher just by having this evaluation framework in place and iterating based on the data.
Advice for Engineers
- Start small: Pick one repetitive workflow in your own life and automate it
- Keep the scope narrow: Agents work best when the task boundaries are clear
- Invest in evaluation early: Think of it like test-driven development
- Design for collaboration: The best agents amplify human decisions, not replace them
Jolie's closing thought: AI is just a tool. What matters most is understanding who you're building for and what problem you're solving. The future isn't AI doing everything alone. It's humans and AI working together.
Based on Jolie Ni's tech talk "AI Agents in Production," hosted by the Ziliudi community.