How to Add Memory to an AI Agent: State Persistence Patterns for Production
Stateless agents — the kind that forget everything when the Worker exits — are fine for one-shot tasks. The moment an agent needs to remember what happened yesterday, recognize a returning customer, or build on previous output, you need memory. There are three patterns: working memory (in-flight context, lives in KV for the duration of a run), episodic memory (what happened and when, a log you can query), and semantic memory (what you know, retrieved via vector search or structured data). Wire the right pattern to the right job.
Every Wednesday. 28,400+ operators. Zero fluff.
✓ Check your inbox — click the confirmation link to complete sign-up.
✓ You're subscribed!
✓ You're already on the list.
Table of contents
Open Table of contents
Why stateless agents keep failing
A stateless agent is one that begins each run with only what you explicitly pass it: the system prompt, the user message, and whatever data you pull fresh at invocation time. It has no awareness of previous runs, previous users, or previous decisions.
For a one-shot classification task — read a comment, return a category — stateless is correct. It’s fast, cheap, and predictable.
The failure surface appears the moment you need continuity:
- A customer-facing agent that doesn’t recognize the customer’s history
- A content agent that recommends an article it already recommended last week
- A moderation agent that keeps re-escalating a resolved case
- A daily brief that surfaces the same stale alert indefinitely
All of these are symptoms of the same problem: the agent has no way to carry context across runs.
Three types of memory
The framing I find useful in production:
- Working memory — what the agent knows right now, during a single run. Held in KV or in-memory for the life of the invocation.
- Episodic memory — what happened and when. A structured log that the agent reads at the start of each run to orient itself.
- Semantic memory — what it knows about the world, customers, or a knowledge base. Retrieved via structured queries or vector search when relevant.
You don’t always need all three. Most agents I run need working + episodic. Semantic memory is the hardest to build and only earns its place when the knowledge base is large enough that you can’t fit it in the context window.
Working memory: in-flight context
Working memory is state that lives for the duration of one agent run. The simplest form is just variables in the function scope. The more interesting form is a shared KV key that sub-tasks within the same run read and write.
My social reply agent uses working memory to accumulate context as it processes a batch of comments in one queue message. It reads recent conversation history for each customer from KV at the start, adds new context as it processes, and writes back at the end.
// workers/social-reply.ts
async function processComment(
comment: SocialCommentEvent,
env: Env
): Promise<void> {
// Load this customer's recent history from KV (working memory)
const historyKey = `customer:${comment.userId}:history`;
const rawHistory = await env.AGENT_KV.get(historyKey);
const history: ConversationTurn[] = rawHistory
? JSON.parse(rawHistory)
: [];
// Build a context-aware system prompt from history
const systemPrompt = buildSystemPrompt(history);
const response = await anthropic.messages.create({
model: "claude-opus-4-8",
max_tokens: 512,
system: systemPrompt,
messages: [{ role: "user", content: comment.text }],
});
const reply =
response.content[0].type === "text" ? response.content[0].text : "";
// Update history — keep last 10 turns, TTL 30 days
const updatedHistory: ConversationTurn[] = [
...history.slice(-9),
{ role: "assistant", content: reply, timestamp: comment.timestamp },
];
await env.AGENT_KV.put(historyKey, JSON.stringify(updatedHistory), {
expirationTtl: 60 * 60 * 24 * 30,
});
await postReply(comment, reply, env);
}Two things to notice. The history is capped at 10 turns — inject a sliding window, don’t grow it unbounded. And the TTL is 30 days: if a customer goes silent for a month, the history expires and the agent starts fresh. Both are intentional.
Episodic memory: what happened and when
Episodic memory is the agent’s log. A structured record of past runs that the agent reads at the start of each new run to avoid repeating itself.
My daily brief agent was surfacing the same stale alerts every day because each run had no awareness of what had already been flagged. The fix: a structured log of past alerts that the agent reads before generating the brief.
// workers/daily-brief.ts
interface AlertLogEntry {
id: string;
surfacedAt: string; // ISO timestamp
resolvedAt?: string;
summary: string;
}
async function buildDailyBrief(env: Env): Promise<void> {
const [emails, calendar, tasks] = await Promise.all([
fetchOvernightEmails(env),
fetchTodayCalendar(env),
fetchTopTasks(env),
]);
// Load episodic memory: what has already been flagged
const rawLog = await env.AGENT_KV.get("brief:alert-log");
const alertLog: AlertLogEntry[] = rawLog ? JSON.parse(rawLog) : [];
// Filter to recent, unresolved alerts only
const sevenDaysAgo = new Date(
Date.now() - 7 * 24 * 60 * 60 * 1000
).toISOString();
const recentAlerts = alertLog.filter(
(e) => e.surfacedAt > sevenDaysAgo && !e.resolvedAt
);
const brief = await synthesizeBrief(
{ emails, calendar, tasks, recentAlerts },
env
);
// Update the log with any new alerts flagged this run
const newAlerts: AlertLogEntry[] = brief.newAlerts.map((a) => ({
id: crypto.randomUUID(),
surfacedAt: new Date().toISOString(),
summary: a,
}));
const updatedLog = [...alertLog, ...newAlerts].slice(-100); // keep last 100
await env.AGENT_KV.put("brief:alert-log", JSON.stringify(updatedLog));
await writeToWorkspace(brief.content, env);
}The agent now knows what it has already said. Duplicate alerts stay out of the brief until the underlying issue changes. When I mark an alert resolved, it drops off the active list.
This pattern generalizes: any agent that produces decisions, flags, or recommendations benefits from a log. The log is cheap (a few KB in KV), the payoff is high (no more redundant outputs).
Semantic memory: what you know
Semantic memory is the knowledge base. It answers “what do you know about X?” at query time, rather than cramming everything into the system prompt upfront.
The simplest form is a structured lookup in KV or a database. My Pickleland booking agent looks up customer profiles and court preferences before drafting confirmations:
// workers/booking-agent.ts
interface CustomerProfile {
userId: string;
preferredCourts: string[];
experienceLevel: "beginner" | "intermediate" | "advanced";
specialNotes: string;
}
async function draftConfirmation(
booking: BookingEvent,
env: Env
): Promise<string> {
// Pull customer profile from KV (semantic memory — factual knowledge)
const profileKey = `customer:${booking.userId}:profile`;
const rawProfile = await env.AGENT_KV.get(profileKey);
const profile: CustomerProfile | null = rawProfile
? JSON.parse(rawProfile)
: null;
const systemPrompt = profile
? `You draft personalized booking confirmations. This customer prefers ${profile.preferredCourts.join(", ")}, is an ${profile.experienceLevel} player. ${profile.specialNotes}`
: "You draft booking confirmations for a pickleball facility.";
const response = await anthropic.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 256,
system: systemPrompt,
messages: [
{
role: "user",
content: `Draft a confirmation for: ${JSON.stringify(booking)}`,
},
],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}For larger knowledge bases — product documentation, a support knowledge base, anything too big to fit in a context window — you need a vector store. The workflow is: embed the query, retrieve the top-k relevant chunks, inject them into the context. Cloudflare Vectorize handles this natively if you’re already on Workers. For larger indexes I’ve used Upstash Vector. The choice depends on scale, not principle.
The honest note on semantic memory: it’s the hardest of the three to build and maintain. The index needs to stay current. Retrieval quality varies. Start with structured lookups — KV, a table in D1 — and only reach for vector search when the structured approach can’t cover the knowledge surface you need.
The memory decision framework
Before you add any memory to an agent, answer three questions:
-
Does the agent need to remember across runs? If every invocation is genuinely independent — a translation, a classification, a one-off generation — skip memory. Stateless is simpler and cheaper.
-
Is the agent repeating itself or acting blind to its own history? If yes, add episodic memory first. It’s the lowest-effort fix and covers most “the agent keeps doing X” complaints.
-
Is the agent treating every user or entity identically when it shouldn’t? If yes, add working memory (customer history, user profile) or semantic memory (a lookup or retrieval system).
The mistake I see most: someone adds a massive knowledge base (semantic memory) to an agent that was actually failing because it had no episodic memory — no log of what it had already done. The complexity doesn’t match the problem.
What I actually use in production
Across 30+ agents:
- All of them have at least working memory — some form of state within a run, even if it’s just the context window itself.
- About half have episodic memory — a log of past runs, decisions, or flags. This is almost always worth adding.
- Three or four have real semantic memory backed by a vector store. These are the agents that answer questions against a large, dynamic knowledge base.
Cloudflare KV is my default store for working and episodic memory. It’s fast, cheap, and natively integrated into Workers — no extra client, no separate credential. The limitation: KV is eventually consistent and not great for high-frequency writes. For agents that write state many times per second, I reach for Durable Objects or a D1 database instead.
For semantic memory backed by vectors, I use Cloudflare Vectorize for small-to-medium indexes (under ~100K vectors) and Upstash Vector for anything larger. Both have first-class JavaScript clients.
The operator’s bottom line
Add memory to an agent when and only when stateless behavior is causing real problems — repeated outputs, blind spots to customer history, ignorance of past decisions. Then pick the right layer: working memory for in-run context, episodic for what happened historically, semantic for what you know. Start with episodic if you’re not sure — it fixes the most common failure mode with the least complexity. Don’t reach for a vector database until you’ve exhausted structured lookups. The best memory system is the simplest one that makes the agent behave correctly.
Related: The agent stack I use to run 30+ production agents · Event-triggered vs scheduled agents · How I measure whether an AI agent is actually working
Need help architecting agent memory for your use case? Get in touch — I design production agent systems for operator teams.
Every Wednesday. 28,400+ operators. Zero fluff.
✓ Check your inbox — click the confirmation link to complete sign-up.
✓ You're subscribed!
✓ You're already on the list.
Related posts
From Idea to Chatbot: The 5 Best Platforms for New Creators
A beginner's comparison of five no-code chatbot builders — SendPulse, Chatbase, Chatfuel, Outgrow, and Landbot — covering channels, ease of use, and pricing to help you pick the right one.
AI AgentsHow to Build Your First MCP Server: A Practitioner's Guide
Updated for 2026. The exact TypeScript code I use to build and register MCP servers — stdio transport, tool definitions, and how to test them in Claude Desktop in under 30 minutes.
AI AgentsPrompt Caching with the Claude API: Cut Your Input Costs Without Switching Models
How to use cache_control to cut Claude API input costs by up to 90% on agents with large stable prompts — the prefix-match invariant, what to cache, silent invalidators, and the break-even math.
Get the AI playbook in your inbox
Every Wednesday. 28,400+ operators. Zero fluff.
Check your inbox.
We sent you a confirmation email — click the link inside to complete your subscription. Check spam if you don't see it within a minute.
You're subscribed.
Welcome — the next edition lands in your inbox soon.
You're already on the list — look for it every Wednesday.