Agent Memory in LangChain: Short-Term, Long-Term, and Episodic
An AI agent without memory is a very expensive stateless function. Every call starts from scratch — no context about who the user is, what was discussed previously, or what the agent has already tried. For transactional use cases, this is fine. For anything requiring multi-turn reasoning, personalization, or learning from past interactions, memory is not optional.
Why Memory Architecture Matters
Before choosing a memory type, clarify what you need memory for:
- Conversation continuity: The agent should remember what was said earlier in this session.
- User personalization: The agent should remember facts about this user across sessions.
- Task state: The agent should remember what it's already tried, what worked, and what failed.
- Knowledge accumulation: The agent should store and retrieve information from external sources or past runs.
Different requirements call for different memory architectures. Conflating them leads to systems that are bloated, slow, or hallucinate recalled facts.
Buffer Memory (ConversationBufferMemory)
Buffer memory is the simplest form: the entire conversation history is stored and passed as context to the LLM on every call.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
chain.predict(input="My name is Alex")
chain.predict(input="What's my name?") # Agent recalls 'Alex'
Strengths: Zero configuration, perfect recall, no information loss. Weaknesses: Context window fills fast — a 30-turn conversation in GPT-4 Turbo can cost $0.30+ per call. Use when: Short-lived conversations (5-10 turns) where perfect recall matters more than cost.
Summary Memory (ConversationSummaryMemory)
Summary memory periodically compresses conversation history into a summary, replacing the full history with a condensed version that fits in fewer tokens.
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)
# After N turns, history is summarized automatically
Strengths: Handles long conversations without exploding context costs. Scales to hundreds of turns. Weaknesses: Lossy — details dropped in summarization may become relevant later. Adds summarization latency and cost. Use when: Long multi-turn conversations, customer support agents, unbounded conversation length.
Summary Buffer Memory (Hybrid)
A hybrid approach: keep the last N interactions verbatim for recent accuracy, and summarize everything older than that threshold. This is the pragmatic default for most conversational agents.
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=500 # Recent messages kept verbatim until 500 tokens, then summarized
)
Vector Store Memory
Vector store memory stores conversations or facts as embeddings in a vector database. When the agent needs context, it retrieves the most semantically relevant memories — not just the most recent ones.
from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})
memory = VectorStoreRetrieverMemory(retriever=retriever)
Strengths: Scales to unlimited memory. Retrieves relevant context regardless of when it occurred. Weaknesses: May miss critical non-semantic context. Adds embedding and retrieval latency. Requires a vector database. Use when: Long-term user personalization, knowledge bases, large interaction histories.
Memory in LangGraph Agents
In LangGraph (the recommended agent runtime for LangChain in 2025), memory is part of the graph state, not a separate memory object. State persists across invocations using checkpointers.
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
checkpointer = MemorySaver()
graph = StateGraph(AgentState)
app = graph.compile(checkpointer=checkpointer)
# Pass thread_id to maintain conversation continuity
config = {"configurable": {"thread_id": user_id}}
app.invoke(input, config=config)
For production, replace MemorySaver with PostgresSaver or RedisSaver to persist state across server restarts.
Memory Architecture Decision Table
| Use Case | Recommended Memory | Why |
| Short chat (5-10 turns) | Buffer | Perfect recall, simple |
| Long chat (50+ turns) | Summary Buffer | Accuracy + cost balance |
| User personalization | Vector Store | Semantic retrieval across sessions |
| Entity tracking | Entity Memory | Structured fact maintenance |
| Complex agent workflows | LangGraph State + Checkpointer | State is first-class, production-ready |
| Multi-session agents | Vector + PostgresSaver | Retrieval + persistence |
Production Considerations
- Memory isolation: Always scope memory by user/session ID. Sharing a memory object across users leaks context — a serious data privacy issue.
- Memory size limits: Set explicit token or turn limits. Unbounded memory growth causes latency creep and cost overruns.
- Memory TTL: Implement time-to-live for cached memories. User preferences from 18 months ago may no longer be valid.
- Testing memory: Agents that work in single-turn testing often fail in multi-turn production because memory state wasn't accounted for in tests.
Related: LangChain Memory Optimization for AI Workflows
FAQs
What is the difference between ConversationBufferMemory and ConversationSummaryMemory?
Buffer memory stores complete conversation history verbatim and passes it to the LLM on every call — perfect recall but context window fills quickly. Summary memory compresses older history into a summary — handles unlimited conversation length but may lose specific details in the summarization process.
How does vector store memory retrieve relevant memories?
When the agent receives a new input, it's embedded using an embedding model, and the vector store is queried for the K nearest embeddings. The retrieved memories are included as context in the LLM prompt alongside the current input — surfacing relevant facts regardless of when they occurred.
Can LangGraph agents maintain memory across server restarts?
Yes. Replace the default MemorySaver (in-memory, lost on restart) with PostgresSaver or RedisSaver. State is serialized and stored in the external database, surviving server restarts and horizontal scaling.
How do you prevent memory from leaking between users?
Always use a unique thread_id or session_id when invoking the agent. In LangGraph, the thread_id in the config dict scopes the checkpointer to a specific conversation. Never share a single memory instance across concurrent requests.