Generative Agents are LLM-driven simulated characters from Stanford and Google research that store experiences in a natural-language memory stream, score memories by recency, importance, and relevance for retrieval, and synthesize higher-level reflections to drive believable behavior.
What are Generative Agents?
Generative Agents are LLM-powered simulated characters introduced by Park et al. in the 2023 paper Generative Agents: Interactive Simulacra of Human Behavior. In a small sandbox town called Smallville, 25 agents went about daily routines, formed opinions, started conversations, and even coordinated a Valentine's Day party, all driven by language-model reasoning over remembered experience. The work showed that believable, consistent behavior over long horizons depends less on the base model and more on an external memory architecture.
The core contribution is that architecture: a memory stream that records experiences in natural language, a retrieval mechanism that surfaces the right memories at the right time, and a reflection process that turns raw observations into higher-level insight. Together these let an agent act coherently across many simulated days far beyond what a single context window could hold.
- Introduced by Park et al. (2023) in the Smallville sandbox simulation.
- 25 agents showed emergent social behavior driven by remembered experience.
- The contribution is the memory architecture, not a new base model.
The Memory Stream
The memory stream is a long, time-stamped list of memory objects written in natural language. Each object records an observation, such as something the agent saw, did, or was told, along with its creation time and the time it was last accessed. Nothing is discarded: the stream is a complete record that grows continuously as the simulation runs.
Because the stream quickly becomes far too large to fit in a prompt, the agent never feeds all of it to the model. Instead, at each decision point it retrieves a small, relevant subset. The design problem is therefore retrieval: choosing which handful of memories out of thousands should inform the next action.
- Each memory object stores a natural-language description plus timestamps.
- The stream is append-only and grows without bound during the simulation.
- Only a small retrieved subset is placed into the prompt at any time.
Retrieval Scoring: Recency, Importance, Relevance
Generative Agents rank candidate memories by combining three signals. Recency favors memories accessed recently and is modeled as exponential decay over time since last access. Importance captures how significant a memory is, obtained by asking the language model to rate it on a scale (the paper uses roughly 1 to 10), so a mundane observation scores low and a meaningful life event scores high. Relevance is the semantic similarity between the current query and the memory, computed as cosine similarity of their embeddings.
Each component is min-max normalized to the range 0 to 1, and the three are summed (with equal weights in the paper) to produce a final retrieval score. The top-scoring memories are passed into the prompt. This scoring scheme has become a reference design copied across many later agent memory systems.
- Recency: exponential decay since the memory was last accessed.
- Importance: an LLM-assigned score (about 1 to 10) of how significant the memory is.
- Relevance: cosine similarity between the query embedding and the memory embedding.
- The three normalized scores are summed, and the highest are retrieved.
Reflection and Planning
Raw observations alone do not let an agent draw conclusions like deciding that another character is passionate about research. Reflection solves this: periodically, when the sum of recent importance scores crosses a threshold, the agent pauses, asks itself what salient questions arise from recent memories, retrieves evidence, and synthesizes higher-level statements. These reflections are written back into the memory stream as new memory objects, so they can themselves be retrieved and reflected upon, forming a tree of increasingly abstract insight.
Reflections and retrieved memories feed planning, where the agent lays out a day and adjusts it as events unfold. The loop of observe, retrieve, reflect, and plan is what produces behavior that stays consistent with an agent's identity and past over long stretches of simulated time.
- Reflection triggers when accumulated importance crosses a threshold.
- The agent poses questions, gathers evidence, and writes synthesized insights back into the stream.
- Reflections plus retrieval drive day-level planning and consistent long-horizon behavior.
Key takeaways
- Generative Agents store experiences in an append-only natural-language memory stream.
- Retrieval ranks memories by summing normalized recency, importance, and relevance scores.
- Recency uses exponential decay, importance is LLM-assigned, and relevance is embedding cosine similarity.
- Reflection synthesizes raw observations into higher-level insights stored back in the stream.
- The architecture, not the base model, is what enables believable long-horizon behavior.
Frequently asked questions
Related terms
Related reading
Sources
Put the idea into practice
MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.
Try MemX Free