Building Production AI Agents with LangChain: A Practical Guide
Feb 25, 2026
7 min read
Building Production AI Agents with LangChain: A Practical Guide
AI agents go beyond chatbots — they take action. They can search databases, call APIs, make decisions, and execute multi-step workflows without human intervention. But moving from prototype to production means handling errors gracefully, adding observability, and ensuring reliability when the LLM makes unexpected choices.
This guide covers building production-ready AI agents with LangChain, from basic tool integration to deployment patterns.
Observe: Process tool results and decide next action
Iterate: Continue until task is complete
Example flow: User asks "What's my account balance and should I invest more this month?"
Agent calls get_account_balance() tool
Receives result: $5,200
Agent calls get_monthly_expenses() tool
Receives result: $3,800
Agent reasons: surplus of $1,400, safe to invest
Returns recommendation with data
Photo by Kindel Media on Pexels
LangChain Agent Types
Agent Type
When to Use
Tool Support
ReAct
General-purpose reasoning
Any tool
OpenAI Functions
Structured tool calling
JSON schema tools
Plan-and-Execute
Multi-step complex tasks
Any tool
Conversational
Stateful multi-turn chat
Memory + tools
Recommendation: Use OpenAI Functions agent for production — it's the most reliable and cheapest (uses function calling API, not prompt engineering).
Building a Basic Agent
Step 1: Define Tools
from langchain.tools import tool
import requests
@tool
def search_company_data(query: str) -> str:
"""Search internal company database. Use for product info, pricing, policies."""
# Call your API
result = requests.post("https://api.internal.com/search", json={"query": query})
return result.json()["answer"]
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send email to a user. Use when user requests notification or update."""
# Call email service
requests.post("https://api.sendgrid.com/v3/mail/send", json={...})
return f"Email sent to {to}"
@tool
def get_weather(location: str) -> str:
"""Get current weather for a location."""
response = requests.get(f"https://api.weather.com/v1/current?location={location}")
return response.json()["summary"]
Tool design rules:
Clear, verb-based names (search_company_data not company_searcher)
Detailed docstrings — LLM uses these to decide when to call
Type hints for parameters
Return strings (LLMs consume text best)
Step 2: Create the Agent
from langchain.agents import initialize_agent, AgentType
from langchain.llms import OpenAI
tools = [search_company_data, send_email, get_weather]
llm = OpenAI(model="gpt-4-turbo", temperature=0)
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True,
max_iterations=5,
early_stopping_method="generate"
)
# Run the agent
response = agent.run("What's the weather in London and email me a summary?")
Photo by Kindel Media on Pexels
Production-Ready Patterns
1. Error Handling and Retries
LLMs are non-deterministic. Agents can fail, call wrong tools, or get stuck in loops.
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def run_agent_with_retry(query: str):
try:
return agent.run(query)
except Exception as e:
logger.error(f"Agent failed: {e}")
# Fallback to simple LLM call without tools
return llm.predict(query)
2. Guardrails and Validation
Prevent agents from doing dangerous things:
from langchain.callbacks import BaseCallbackHandler
class SafetyCallback(BaseCallbackHandler):
def on_tool_start(self, tool, input_str, **kwargs):
# Block destructive operations
if tool.name == "delete_database" and "production" in input_str:
raise ValueError("Agent attempted to delete production database!")
# Rate limit expensive tools
if tool.name == "expensive_api" and self.call_count > 10:
raise ValueError("Rate limit exceeded for expensive_api")
agent = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
callbacks=[SafetyCallback()]
)
3. Observability and Logging
Track what agents are doing:
from langsmith import Client
# Initialize LangSmith for tracing
client = Client()
# All agent calls are automatically traced
with client.trace("user_query", tags=["production", "customer_support"]):
result = agent.run("Help me with my order #12345")
# Log to your metrics system
logger.info("agent_execution", {
"user_id": user_id,
"query": query,
"tools_called": agent.tools_used,
"iterations": agent.iteration_count,
"latency_ms": latency,
"success": True
})
4. Cost Control
Agents can burn through tokens fast with iterative tool calling:
Set max_iterations: Cap at 3-5 to prevent runaway loops
Use cheaper models: GPT-3.5-turbo for simple tools, GPT-4 only for complex reasoning
Cache tool results: If user asks "weather in London" twice, don't call API twice
Monitor per-user spend: Alert when a user exceeds $10/day in LLM costs
@tool
def send_invoice(customer_id: str, amount: float) -> str:
"""Send invoice to customer. Requires human approval."""
# Save to approval queue
approval_id = save_to_queue({"customer_id": customer_id, "amount": amount})
return f"Invoice queued for approval (ID: {approval_id}). Awaiting human review."
Deployment Options
Platform
Best For
Cost
AWS Lambda
Serverless, low traffic
$0.20 per 1M requests
GCP Cloud Run
Containerized agents
$0.40 per 1M requests
Modal.com
GPU-heavy workloads
$0.30 per 1M requests
Kubernetes
High scale, full control
$200-500/month base
FAQs
How much do production AI agents cost per request?
Expect $0.01-0.05 per agent execution with GPT-4, $0.001-0.01 with GPT-3.5-turbo. Multi-step agents (3-5 tool calls) can burn 5-10K tokens per run. Budget: $500-2000/month for 50K agent executions. Use GPT-3.5 for simple tasks, GPT-4 only when reasoning quality matters. Cache tool results aggressively.
What causes agents to fail or loop infinitely?
Common failures: (1) unclear tool descriptions confuse the LLM, (2) tool returns unexpected format (LLM can't parse), (3) no termination condition (loops calling same tool). Fix: write detailed tool docstrings, validate tool outputs, set max_iterations=5, add early_stopping. Monitor iteration counts — if >3 regularly, your tools are poorly designed.
When should you use an agent vs a simple chain?
Use chains when the workflow is fixed (A → B → C always). Use agents when the LLM needs to decide which tools to call and in what order. Example: "Summarize this doc" = chain. "Research this topic and email me a summary" = agent (needs to decide: search → synthesize → send email). Agents cost 2-5x more due to planning overhead.
How do you test AI agents reliably?
Unit test each tool independently with mocked LLM responses. Integration test: run agent with deterministic queries (temperature=0) and assert expected tool call sequence. Use LangSmith evals: define test cases (query + expected tool usage + success criteria), run nightly, track pass rate. Target: 85%+ consistency before production. Regression test after every LLM provider upgrade.
How do you reduce agent latency?
Agents are inherently slower (3-10s vs 1-2s for simple LLM calls). Optimize: (1) Parallel tool calls when possible (LangChain supports this with OpenAI Functions), (2) Use streaming responses to show progress, (3) Cache tool results, (4) Use faster models (GPT-3.5-turbo = 2x faster than GPT-4), (5) Pre-warm agent instances in serverless environments.
Need an expert team to provide digital solutions for your business?