How to Add Memory to an AI Agent: State Persistence Patterns for Production

Alejandro Rioja

June 29, 2026 8 min read

TL;DR

Stateless agents — the kind that forget everything when the Worker exits — are fine for one-shot tasks. The moment an agent needs to remember what happened yesterday, recognize a returning customer, or build on previous output, you need memory. There are three patterns: working memory (in-flight context, lives in KV for the duration of a run), episodic memory (what happened and when, a log you can query), and semantic memory (what you know, retrieved via vector search or structured data). Wire the right pattern to the right job.

Free newsletter

Every Wednesday. 28,400+ operators. Zero fluff.

Open Table of contents

Why stateless agents keep failing
Three types of memory
Working memory: in-flight context
Episodic memory: what happened and when
Semantic memory: what you know
The memory decision framework
What I actually use in production
The operator’s bottom line

Why stateless agents keep failing

A stateless agent is one that begins each run with only what you explicitly pass it: the system prompt, the user message, and whatever data you pull fresh at invocation time. It has no awareness of previous runs, previous users, or previous decisions.

For a one-shot classification task — read a comment, return a category — stateless is correct. It’s fast, cheap, and predictable.

The failure surface appears the moment you need continuity:

A customer-facing agent that doesn’t recognize the customer’s history
A content agent that recommends an article it already recommended last week
A moderation agent that keeps re-escalating a resolved case
A daily brief that surfaces the same stale alert indefinitely

All of these are symptoms of the same problem: the agent has no way to carry context across runs.

Three types of memory

The framing I find useful in production:

Working memory — what the agent knows right now, during a single run. Held in KV or in-memory for the life of the invocation.
Episodic memory — what happened and when. A structured log that the agent reads at the start of each run to orient itself.
Semantic memory — what it knows about the world, customers, or a knowledge base. Retrieved via structured queries or vector search when relevant.

You don’t always need all three. Most agents I run need working + episodic. Semantic memory is the hardest to build and only earns its place when the knowledge base is large enough that you can’t fit it in the context window.

Working memory: in-flight context

Working memory is state that lives for the duration of one agent run. The simplest form is just variables in the function scope. The more interesting form is a shared KV key that sub-tasks within the same run read and write.

My social reply agent uses working memory to accumulate context as it processes a batch of comments in one queue message. It reads recent conversation history for each customer from KV at the start, adds new context as it processes, and writes back at the end.

typescript

// workers/social-reply.ts

async function processComment(
  comment: SocialCommentEvent,
  env: Env
): Promise<void> {
  // Load this customer's recent history from KV (working memory)
  const historyKey = `customer:${comment.userId}:history`;
  const rawHistory = await env.AGENT_KV.get(historyKey);
  const history: ConversationTurn[] = rawHistory
    ? JSON.parse(rawHistory)
    : [];

  // Build a context-aware system prompt from history
  const systemPrompt = buildSystemPrompt(history);

  const response = await anthropic.messages.create({
    model: "claude-opus-4-8",
    max_tokens: 512,
    system: systemPrompt,
    messages: [{ role: "user", content: comment.text }],
  });

  const reply =
    response.content[0].type === "text" ? response.content[0].text : "";

  // Update history — keep last 10 turns, TTL 30 days
  const updatedHistory: ConversationTurn[] = [
    ...history.slice(-9),
    { role: "assistant", content: reply, timestamp: comment.timestamp },
  ];
  await env.AGENT_KV.put(historyKey, JSON.stringify(updatedHistory), {
    expirationTtl: 60 * 60 * 24 * 30,
  });

  await postReply(comment, reply, env);
}

Two things to notice. The history is capped at 10 turns — inject a sliding window, don’t grow it unbounded. And the TTL is 30 days: if a customer goes silent for a month, the history expires and the agent starts fresh. Both are intentional.

Episodic memory: what happened and when

Episodic memory is the agent’s log. A structured record of past runs that the agent reads at the start of each new run to avoid repeating itself.

My daily brief agent was surfacing the same stale alerts every day because each run had no awareness of what had already been flagged. The fix: a structured log of past alerts that the agent reads before generating the brief.

typescript

// workers/daily-brief.ts

interface AlertLogEntry {
  id: string;
  surfacedAt: string; // ISO timestamp
  resolvedAt?: string;
  summary: string;
}

async function buildDailyBrief(env: Env): Promise<void> {
  const [emails, calendar, tasks] = await Promise.all([
    fetchOvernightEmails(env),
    fetchTodayCalendar(env),
    fetchTopTasks(env),
  ]);

  // Load episodic memory: what has already been flagged
  const rawLog = await env.AGENT_KV.get("brief:alert-log");
  const alertLog: AlertLogEntry[] = rawLog ? JSON.parse(rawLog) : [];

  // Filter to recent, unresolved alerts only
  const sevenDaysAgo = new Date(
    Date.now() - 7 * 24 * 60 * 60 * 1000
  ).toISOString();
  const recentAlerts = alertLog.filter(
    (e) => e.surfacedAt > sevenDaysAgo && !e.resolvedAt
  );

  const brief = await synthesizeBrief(
    { emails, calendar, tasks, recentAlerts },
    env
  );

  // Update the log with any new alerts flagged this run
  const newAlerts: AlertLogEntry[] = brief.newAlerts.map((a) => ({
    id: crypto.randomUUID(),
    surfacedAt: new Date().toISOString(),
    summary: a,
  }));

  const updatedLog = [...alertLog, ...newAlerts].slice(-100); // keep last 100
  await env.AGENT_KV.put("brief:alert-log", JSON.stringify(updatedLog));

  await writeToWorkspace(brief.content, env);
}

The agent now knows what it has already said. Duplicate alerts stay out of the brief until the underlying issue changes. When I mark an alert resolved, it drops off the active list.

This pattern generalizes: any agent that produces decisions, flags, or recommendations benefits from a log. The log is cheap (a few KB in KV), the payoff is high (no more redundant outputs).

Semantic memory: what you know

Semantic memory is the knowledge base. It answers “what do you know about X?” at query time, rather than cramming everything into the system prompt upfront.

The simplest form is a structured lookup in KV or a database. My Pickleland booking agent looks up customer profiles and court preferences before drafting confirmations:

typescript

// workers/booking-agent.ts

interface CustomerProfile {
  userId: string;
  preferredCourts: string[];
  experienceLevel: "beginner" | "intermediate" | "advanced";
  specialNotes: string;
}

async function draftConfirmation(
  booking: BookingEvent,
  env: Env
): Promise<string> {
  // Pull customer profile from KV (semantic memory — factual knowledge)
  const profileKey = `customer:${booking.userId}:profile`;
  const rawProfile = await env.AGENT_KV.get(profileKey);
  const profile: CustomerProfile | null = rawProfile
    ? JSON.parse(rawProfile)
    : null;

  const systemPrompt = profile
    ? `You draft personalized booking confirmations. This customer prefers ${profile.preferredCourts.join(", ")}, is an ${profile.experienceLevel} player. ${profile.specialNotes}`
    : "You draft booking confirmations for a pickleball facility.";

  const response = await anthropic.messages.create({
    model: "claude-haiku-4-5-20251001",
    max_tokens: 256,
    system: systemPrompt,
    messages: [
      {
        role: "user",
        content: `Draft a confirmation for: ${JSON.stringify(booking)}`,
      },
    ],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

For larger knowledge bases — product documentation, a support knowledge base, anything too big to fit in a context window — you need a vector store. The workflow is: embed the query, retrieve the top-k relevant chunks, inject them into the context. Cloudflare Vectorize handles this natively if you’re already on Workers. For larger indexes I’ve used Upstash Vector. The choice depends on scale, not principle.

The honest note on semantic memory: it’s the hardest of the three to build and maintain. The index needs to stay current. Retrieval quality varies. Start with structured lookups — KV, a table in D1 — and only reach for vector search when the structured approach can’t cover the knowledge surface you need.

The memory decision framework

Before you add any memory to an agent, answer three questions:

Does the agent need to remember across runs? If every invocation is genuinely independent — a translation, a classification, a one-off generation — skip memory. Stateless is simpler and cheaper.
Is the agent repeating itself or acting blind to its own history? If yes, add episodic memory first. It’s the lowest-effort fix and covers most “the agent keeps doing X” complaints.
Is the agent treating every user or entity identically when it shouldn’t? If yes, add working memory (customer history, user profile) or semantic memory (a lookup or retrieval system).

The mistake I see most: someone adds a massive knowledge base (semantic memory) to an agent that was actually failing because it had no episodic memory — no log of what it had already done. The complexity doesn’t match the problem.

What I actually use in production

Across 30+ agents:

All of them have at least working memory — some form of state within a run, even if it’s just the context window itself.
About half have episodic memory — a log of past runs, decisions, or flags. This is almost always worth adding.
Three or four have real semantic memory backed by a vector store. These are the agents that answer questions against a large, dynamic knowledge base.

Cloudflare KV is my default store for working and episodic memory. It’s fast, cheap, and natively integrated into Workers — no extra client, no separate credential. The limitation: KV is eventually consistent and not great for high-frequency writes. For agents that write state many times per second, I reach for Durable Objects or a D1 database instead.

For semantic memory backed by vectors, I use Cloudflare Vectorize for small-to-medium indexes (under ~100K vectors) and Upstash Vector for anything larger. Both have first-class JavaScript clients.

The operator’s bottom line

Add memory to an agent when and only when stateless behavior is causing real problems — repeated outputs, blind spots to customer history, ignorance of past decisions. Then pick the right layer: working memory for in-run context, episodic for what happened historically, semantic for what you know. Start with episodic if you’re not sure — it fixes the most common failure mode with the least complexity. Don’t reach for a vector database until you’ve exhausted structured lookups. The best memory system is the simplest one that makes the agent behave correctly.

Need help architecting agent memory for your use case? Get in touch — I design production agent systems for operator teams.

Keep reading

AI Agents

Get the AI playbook in your inbox

Every Wednesday. 28,400+ operators. Zero fluff.

How to Add Memory to an AI Agent: State Persistence Patterns for Production

Table of contents

Why stateless agents keep failing

Three types of memory

Working memory: in-flight context

Episodic memory: what happened and when

Semantic memory: what you know

The memory decision framework

What I actually use in production

The operator’s bottom line

Human-in-the-Loop AI Agents: When to Build an Approval Gate (and When Not To)

Claude Tool Use: How I Give My AI Agents Real-World Capabilities

Claude vs ChatGPT for Business in 2026: An Operator's Honest Take

Get the AI playbook in your inbox

How to Add Memory to an AI Agent: State Persistence Patterns for Production

Table of contents

Why stateless agents keep failing

Three types of memory

Working memory: in-flight context

Episodic memory: what happened and when

Semantic memory: what you know

The memory decision framework

What I actually use in production

The operator’s bottom line

Related posts

Human-in-the-Loop AI Agents: When to Build an Approval Gate (and When Not To)

Claude Tool Use: How I Give My AI Agents Real-World Capabilities

Claude vs ChatGPT for Business in 2026: An Operator's Honest Take

Get the AI playbook in your inbox