AI Explained

Why AI Agents Forget: 3 Memory Types

Aditya Kumar JhaAditya Kumar JhaLinkedIn·June 5, 2026·11 min read

Episodic, semantic, and procedural memory: the three kinds an AI agent needs to act well, plus how each is stored.

Your agent asks for the same preference it learned yesterday, then reruns a plan that already failed once. The cause is almost always one mistake: treating memory as a single bucket. AI agent memory splits into three types. Episodic memory stores specific past events. Semantic memory stores general facts and preferences. Procedural memory stores how to do things. A fourth, working memory, is the short-term scratchpad inside the context window. Keep them apart and the repeat-question, forgotten-preference, broken-replan failures mostly disappear.

This taxonomy comes straight from cognitive science and now shapes how engineers design agent memory architecture. Psychologist Endel Tulving first split long-term memory into episodic and semantic systems in a 1972 book chapter, arguing that recalling a personal event is a different operation from knowing a fact. Procedural memory, the implicit knowledge of how to perform a skill, completes the standard picture. Modern language-agent frameworks map these same categories onto storage and retrieval inside an LLM system.

Insight

The blueprint for your agent's memory was finished in 1972, decades before the transistor count made LLMs possible. Endel Tulving split human long-term memory into episodic and semantic systems that year. Modern frameworks like CoALA just ported his categories onto storage and retrieval.

Insight

This post covers the TYPE axis of agent memory: episodic vs semantic vs procedural. It is the companion to the MemX guide on the DURATION axis, short-term vs long-term memory. The two axes are orthogonal: any type can be held briefly in context or persisted for months.

Why do AI agents need different types of memory?

Different memory types answer different questions, so they need different storage and retrieval. Episodic memory answers "what happened, and when?" Semantic memory answers "what is true about this user or domain?" Procedural memory answers "how do I carry out this task?" Cramming all three into a single vector store blurs those questions and makes retrieval noisy: a query for a user's stated preference should not compete with a transcript of last Tuesday's debugging session.

The Cognitive Architectures for Language Agents framework, known as CoALA, formalizes this for LLM agents. It describes a language agent as modular memory components plus a structured action space for reading and writing them, drawing the working, episodic, semantic, and procedural distinctions from cognitive science and production cognitive architectures.

What is episodic memory in an AI agent?

Episodic memory stores specific past events the agent took part in, each anchored to a time and context. "Last Tuesday the user asked to export the report as CSV and it failed" is an episode. So is a full conversation turn, a tool call and its result, or an environment observation. The defining feature is particularity: a single occurrence, not a generalization.

Agent systems usually write episodic memory to an append-only log and index it in a vector store. At retrieval time the agent embeds the current situation and pulls the most similar past episodes. LangChain's memory documentation notes that episodic memory is often implemented through few-shot example prompting, where the agent learns from past successful sequences and selects relevant examples based on the current input.

Surveys of autonomous LLM agents describe episodic memory as records of concrete experiences such as individual tool calls, conversation turns, and environment observations, retrieved when a new situation resembles an old one. The human analogy is recalling a particular event, like remembering details of a specific trip rather than knowing geography in the abstract.

Here is what most explainers leave out: production episodic recall is rarely pure similarity search. The Generative Agents paper by Joon Sung Park and colleagues at Stanford scores each stored memory by three signals at once: recency, with an exponential decay on the last access time; importance, an integer the language model itself assigns to each memory; and relevance, the cosine similarity to the current query. The final rank is the sum of those three normalized to a zero-to-one range. Match on embeddings alone and an agent surfaces a vivid but stale episode over the recent fix that actually matters.

Two design choices shape how useful episodic memory becomes. The first is what counts as an episode: a single tool call is fine-grained and precise, while a whole session is coarse and easier to summarize but harder to match. Many systems store both, raw turns for exact recall plus a short distilled summary for cheap retrieval. The second is decay. An unbounded episodic log grows forever and slows search, so production agents prune, summarize, or downrank old episodes that no longer match incoming situations.

Episodic recall is also where agents learn from their own mistakes. A failed plan stored as an episode, with the error and the eventual fix attached, lets the agent avoid repeating the failure when a similar request arrives. That feedback loop is the practical payoff of separating episodes from facts: an outcome is bound to the situation that produced it, not flattened into a context-free statement.

What is semantic memory in an AI agent?

Semantic memory holds general facts stripped of the moment they were learned. "The user prefers metric units," "the production database is Postgres," and "invoices are due net 30" all qualify. The conversation that taught the fact is gone. The fact stays.

LangChain's documentation describes two common storage shapes for semantic memory: a continuously updated profile, typically a JSON document describing the user or domain, or a growing collection of documents extended over time. Retrieval happens through semantic search or simple content filtering, so the agent can fetch only the facts relevant to the current task.

This is the system Tulving described as the store of general knowledge: knowing that a fact holds without any sense of when or where it was acquired. He framed this as plain knowing, distinct from the self-aware recollection he called autonoetic consciousness that marks episodic memory.

Pro Tip

The one-line test: if a fact is still useful after you forget the conversation that taught it, it is semantic memory. If the timing or order of events matters, it is an episode.

What is procedural memory in an AI agent?

Procedural memory encodes how to do things: skills, routines, tool-use patterns, and the rules the agent follows. In a human it is implicit, the kind of know-how expressed in performance rather than conscious recall, like riding a bike. It is a non-declarative form of memory, sitting apart from the declarative episodic and semantic systems Tulving described.

For an LLM agent, procedural memory lives in three places, per LangChain's framing: the model weights, the agent code, and the agent's prompt. The weights and code are usually fixed at runtime, so the practical surface for learning is the prompt. Agents update procedural memory by editing their own instructions through reflection and meta-prompting, refining the system prompt and tool definitions as they discover what works.

Procedural memory shows up as the system prompt, the catalog of tool schemas the agent can call, and any learned routines or playbooks it has saved. Retrieval here is less about search and more about assembly: the relevant instructions and tool definitions are composed into the prompt before the model runs.

Procedural memory is the riskiest of the three to mutate automatically, because a bad edit to the system prompt changes the agent's behavior on every future request, not just one. That is why mature systems gate procedural updates behind validation: an instruction earns a place in the prompt only after it demonstrably improves outcomes across several runs. Episodic and semantic writes are far cheaper to roll back, since a single wrong fact or stale episode affects only the queries that retrieve it.

What is working memory in an AI agent?

Working memory is the agent's short-term scratchpad, the information held inside the current context window: the active conversation, intermediate reasoning, and recently retrieved facts. CoALA treats it as the central component that the other three long-term stores feed into during a single decision cycle.

Working memory is bounded by the context window and disappears when the session ends, which is exactly why the other three types exist. Episodic, semantic, and procedural memory are the durable stores that survive between sessions and get loaded back into working memory when needed. The duration side of that relationship is covered in depth in the MemX short-term vs long-term guide.

Because working memory is finite, retrieval from the long-term stores is a memory management and budgeting problem. Every token spent on a retrieved episode or fact is a token not available for reasoning, so an agent has to fetch the few most relevant items rather than everything it knows. Good retrieval is therefore as much about ranking and trimming as about recall, and it is one reason the type split helps: narrowing a query to the right store cuts the candidate set before ranking even begins.

The three types side by side

Memory typeEpisodicSemanticProcedural
What it storesSpecific past events and interactions, tied to timeGeneral facts, user preferences, domain knowledgeSkills, routines, tool patterns, the system prompt
Example in an agent"Last Tuesday the CSV export failed for this user""This user prefers concise answers in metric units""Call the search tool, then summarize, then cite"
How it is storedAppend-only log indexed in a vector storeUser profile JSON or a collection of fact documentsPrompt text, tool schemas, and saved playbooks
How it is retrievedSimilarity search on the current situationSemantic search or content filtering by relevanceAssembled into the prompt before the model runs
Human analogyRecalling a specific event you lived throughKnowing a fact without recalling when you learned itKnowing how to ride a bike without thinking
Cost of a bad writeLow: a stale episode only affects queries that retrieve itLow: a wrong fact is easy to deduplicate or overwriteHigh: a bad prompt edit changes behavior on every request

How do episodic, semantic, and procedural memory work together?

A well-built agent uses all three on one request. Suppose a user asks it to prepare the weekly sales report. Procedural memory supplies the routine: which tools to call and in what order. Semantic memory supplies the facts: this user wants totals in euros, and the report goes to the finance channel. Episodic memory supplies the history: the last run timed out, so retry with a smaller date range. Working memory holds it all while the model reasons. Get the split wrong and the failures are predictable. Mix episodes into the fact store and retrieval goes noisy. Auto-edit procedural memory and one bad line rewrites every future request.

The flow matters for design. Each store has its own write path and read path, so an agent should record an episode after an interaction, distill durable facts into semantic memory, and update procedural instructions only when a strategy proves out. Mixing these write paths, for example dumping raw transcripts into the same place as curated facts, is what degrades retrieval quality over time.

  • Episodic write: log the interaction, embed it, store the timestamp and context.
  • Semantic write: extract durable facts, deduplicate against the existing profile.
  • Procedural write: revise the prompt or tool set only after a routine is validated.
  • Read order on a request: assemble procedural instructions, fetch relevant semantic facts, retrieve similar episodes, then reason in working memory.

Where MemX fits

The same idea shows up in everyday life: you keep a record of what happened and the facts you care about, then find them again later. MemX is a consumer app that does this for you, a personal second brain rather than anything that sits inside an AI agent. You dump in photos, PDFs, scanned documents, voice notes, and WhatsApp messages. On-device tools read them: the Document Scanner pulls names, dates, and amounts from receipts and IDs, Photo Memory flags and indexes images, and Voice to Memory turns a short recording into a polished note. Then you ask in plain English and Ask MemX answers with a citation back to the original source. It runs on Android and on iOS through TestFlight, works over WhatsApp, and the web version is coming soon. Your private photos never leave your device, and nothing you store is used to train any model.

Frequently asked questions

Frequently Asked Questions
01What are the three types of AI agent memory?

Episodic, semantic, and procedural. Episodic stores specific past events the agent experienced, semantic stores general facts and preferences, and procedural stores how to do things, including skills, tool patterns, and the system prompt. Many systems add working memory, the in-context scratchpad for the current task.

02What is the difference between episodic and semantic memory in AI?

Episodic memory records specific events tied to a time, such as a past conversation or a failed tool call. Semantic memory holds general facts with no sense of when they were learned, such as a user's preferences. The split comes from Endel Tulving's 1972 distinction in cognitive psychology.

03What is procedural memory in AI?

It is the agent's how-to knowledge: skills, routines, tool-use patterns, and rules. In an LLM agent it lives in the model weights, the agent code, and the agent's prompt, per LangChain's framing. Weights and code are usually fixed at runtime, so agents update procedural memory by editing their own instructions and tool definitions through reflection and meta-prompting.

04Is working memory the same as long-term agent memory?

No. Working memory is the short-term scratchpad inside the context window and vanishes when the session ends. Episodic, semantic, and procedural memory are long-term stores that persist between sessions. Type and duration are separate axes: any type can be short-lived or persisted.

05How does an AI agent retrieve episodic memory?

Past interactions go into a log indexed in a vector store, then the agent pulls episodes that match the current situation. Production systems blend signals: the Generative Agents paper ranks by recency, model-assigned importance, and embedding relevance combined, not similarity alone.

Read Next

Or try MemX to access 40+ AI models in one place — including Claude Sonnet 4.6 and GPT-5.4 — and get your questions answered today.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free · iOS, Android & WhatsApp

Stop losing what you save.
Let MemX remember it for you.

Every screenshot, photo, PDF and voice note — captured, encrypted, and instantly searchable. Ask in plain English, get the answer in seconds.

  • Reads text inside images and handwriting
  • Private and encrypted by default
  • Free to start, no credit card

Takes under a minute to set up. Your data stays yours.

Aditya Kumar Jha
Written by
Aditya Kumar JhaLinkedIn

Core software engineer at MemX, where he builds the website, backend, and data systems. Also a published author of six books on Amazon KDP, writing on AI, memory, and behavior.

Keep reading

More guides for AI-powered students.