MemGPT is a system that treats an LLM like an operating system, using virtual context management to page information between a limited in-context main memory and unlimited external storage, giving the model effectively unbounded, self-managed memory.
What is MemGPT?
MemGPT is a system, introduced by Packer et al. in 2023, that gives large language models effectively unbounded memory by borrowing the memory hierarchy of a traditional operating system. An OS uses fast but small physical RAM alongside large but slow disk, and pages data between them so programs appear to have far more memory than physically exists. MemGPT applies the same idea to an LLM: the context window acts as fast main memory, and external storage acts as slow memory, with the model itself deciding what to swap in and out.
The central problem MemGPT addresses is the fixed context window. A model cannot attend to more tokens than its window allows, so long conversations and large documents eventually exceed it. MemGPT manages this boundary explicitly rather than letting old information silently fall off the edge.
- MemGPT treats the LLM context window as RAM and external storage as disk.
- It was introduced in the 2023 paper MemGPT: Towards LLMs as Operating Systems.
- Its goal is to overcome the fixed context window without retraining the model.
The Two-Tier Memory Hierarchy
MemGPT splits memory into main context and external context. Main context is everything currently inside the LLM's context window: the system instructions, a working set of recent messages, and editable memory blocks the agent maintains about the user and itself. External context lives outside the window in storage such as a database or vector store, holding the full conversation history and archived facts that can be recalled on demand.
Because main context is finite, MemGPT must decide what deserves a place in it. Information that is needed now is paged in; information that is stale is evicted to external context where it remains searchable. This mirrors how an operating system keeps the active working set in RAM and the rest on disk.
- Main context: system prompt, recent messages, and editable memory blocks inside the window.
- External context: full history and archived facts in a searchable store outside the window.
- MemGPT moves data between the two tiers to keep the active working set in context.
Function Calls, Paging, and Self-Editing Memory
MemGPT drives memory management through function calls the LLM emits itself. Rather than a human writing paging logic, the model is prompted to call tools such as appending to its core memory, searching archival storage, or pulling a past conversation back into context. The model can therefore edit its own memory, updating facts about the user and reflecting on prior interactions over time.
Control flow is handled with an event loop and interrupts, again echoing an operating system. When the context window approaches its limit, a memory-pressure event prompts the agent to summarize or evict content. The model also yields control back to the user and resumes when new input or a system event arrives.
- The LLM issues function calls to read, write, search, and page its own memory.
- Self-editing memory lets the agent revise stored facts and reflections over time.
- Memory-pressure interrupts trigger summarization or eviction before the window overflows.
What MemGPT Enables and Its Legacy
The paper demonstrated MemGPT on two settings the underlying context window could not handle directly: analyzing documents longer than the window, and multi-session chat where an agent remembers, reflects on, and evolves across conversations that together exceed the window.
MemGPT helped popularize agent memory as a first-class concern and evolved into the open-source Letta framework after the project was renamed from MemGPT to Letta in September 2024. Today MemGPT denotes the design pattern and Letta the framework. Its core pattern, an LLM that manages tiered memory through tool calls, now appears across many agent stacks, including memory layers that persist user facts between sessions.
- Demonstrated on long-document analysis and persistent multi-session chat.
- Renamed to the open-source Letta project for stateful agents in September 2024.
- Established tool-driven tiered memory as a common agent design pattern.
Key takeaways
- MemGPT treats the LLM like an OS, paging data between an in-context main memory and external storage.
- The model manages its own memory through function calls instead of hard-coded paging logic.
- Memory-pressure interrupts prompt the agent to summarize or evict content before the window overflows.
- It enables long-document analysis and persistent multi-session chat beyond a fixed context window.
- MemGPT became the open-source Letta framework and shaped modern agent memory design.
Frequently asked questions
Related terms
Related reading
Sources
Put the idea into practice
MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.
Try MemX Free