AI AgentProductionEnterprise

AI Agents in Production: Bridging the Gap Between Demos and Reality

Noted by Yun Fan·Feb 28, 2026·4 min read·

We see impressive AI agent demos every day. Tasks completed automatically, zero human intervention, seemingly ready to replace entire teams. But in real enterprise environments, the story is far more nuanced.

Jolie Ni, an engineer specializing in enterprise AI adoption, recently shared her hands-on experience building AI agents for companies and revealed the challenges that flashy demos never talk about.

AI Agents Are More Than Chatbots

AI agents have evolved through several stages:

Simple Q&A: Asking an LLM basic questions and getting answers
Knowledge retrieval: Attaching a knowledge base so the AI can look up relevant information
Action execution: Letting AI perform tasks like sending emails or calling APIs
Stateful agents: Giving AI its own memory that updates over time, so it can track progress and make decisions based on context

Today's AI agents combine all of these. They interpret what you want, figure out the next step, take action, and learn from the results. They keep iterating until the goal is met.

Two Common Approaches

Approach 1: Rule-based orchestration

Routes are predefined (e.g., refund, cancel, escalate)
AI classifies the user's intent and follows a fixed path
Very predictable, very low risk
Best for: workflows that already exist and have clear steps

Approach 2: LLM-based orchestration

AI decides the next step on its own from a set of available tools
More flexible, handles unexpected situations better
Less predictable, results can vary
Best for: complex scenarios with multiple parties and changing requirements

Real-World Case Studies

Case 1: Sales outreach agent (rule-based)

Jolie built this for a company that wanted to automate prospecting. The original human workflow was already well-defined: research leads, collect info, write personalized emails, track replies, log everything.

Challenges that came up in production:

Web search results were inconsistent across different prospects
Email quality varied because available information varied
AI could generate millions of emails per day, but inboxes have daily sending limits
Had to decide where humans should approve before sending

Case 2: Relocation support agent (LLM-based)

This agent helped coordinate international moves: visas, shipping, short-term housing, bank accounts, and more. It had to communicate with customers, shipping vendors, and visa providers across different channels.

Challenges that came up in production:

Handling sensitive personal data (passports, social security numbers) required strict memory isolation per customer
The agent needed its own identity and permissions so every action could be tracked
Multiple parties had dependencies on each other (e.g., don't start the move before the visa is approved)
Conversations happened across email, web portals, and other platforms simultaneously

How to Measure Agent Performance

Jolie stressed that you need to define success before you start building. Her recommendations:

Break evaluation into yes/no questions: "Does this email contain the correct name?" is much more reliable than "Is this email good enough?"
Track human approval rates: If humans are rejecting a lot of the agent's outputs, something is wrong
Build a golden dataset: Use real historical cases, including edge cases and failures, as your test suite
Run regression tests regularly: Every time you change a prompt or upgrade a model, re-run your tests to make sure nothing broke

Her team saw task completion climb from around 20% to much higher just by having this evaluation framework in place and iterating based on the data.

Advice for Engineers

Start small: Pick one repetitive workflow in your own life and automate it
Keep the scope narrow: Agents work best when the task boundaries are clear
Invest in evaluation early: Think of it like test-driven development
Design for collaboration: The best agents amplify human decisions, not replace them

Jolie's closing thought: AI is just a tool. What matters most is understanding who you're building for and what problem you're solving. The future isn't AI doing everything alone. It's humans and AI working together.

Based on Jolie Ni's tech talk "AI Agents in Production," hosted by the Ziliudi community.