Memory & Cognition

MemGPT (LLM as Operating System)

By Arpit Tripathi, Founder

MemGPT is a system that treats an LLM like an operating system, using virtual context management to page information between a limited in-context main memory and unlimited external storage, giving the model effectively unbounded, self-managed memory.

What is MemGPT?

MemGPT is a system, introduced by Packer et al. in 2023, that gives large language models effectively unbounded memory by borrowing the memory hierarchy of a traditional operating system. An OS uses fast but small physical RAM alongside large but slow disk, and pages data between them so programs appear to have far more memory than physically exists. MemGPT applies the same idea to an LLM: the context window acts as fast main memory, and external storage acts as slow memory, with the model itself deciding what to swap in and out.

The central problem MemGPT addresses is the fixed context window. A model cannot attend to more tokens than its window allows, so long conversations and large documents eventually exceed it. MemGPT manages this boundary explicitly rather than letting old information silently fall off the edge.

  • MemGPT treats the LLM context window as RAM and external storage as disk.
  • It was introduced in the 2023 paper MemGPT: Towards LLMs as Operating Systems.
  • Its goal is to overcome the fixed context window without retraining the model.

The Two-Tier Memory Hierarchy

MemGPT splits memory into main context and external context. Main context is everything currently inside the LLM's context window: the system instructions, a working set of recent messages, and editable memory blocks the agent maintains about the user and itself. External context lives outside the window in storage such as a database or vector store, holding the full conversation history and archived facts that can be recalled on demand.

Because main context is finite, MemGPT must decide what deserves a place in it. Information that is needed now is paged in; information that is stale is evicted to external context where it remains searchable. This mirrors how an operating system keeps the active working set in RAM and the rest on disk.

  • Main context: system prompt, recent messages, and editable memory blocks inside the window.
  • External context: full history and archived facts in a searchable store outside the window.
  • MemGPT moves data between the two tiers to keep the active working set in context.

Function Calls, Paging, and Self-Editing Memory

MemGPT drives memory management through function calls the LLM emits itself. Rather than a human writing paging logic, the model is prompted to call tools such as appending to its core memory, searching archival storage, or pulling a past conversation back into context. The model can therefore edit its own memory, updating facts about the user and reflecting on prior interactions over time.

Control flow is handled with an event loop and interrupts, again echoing an operating system. When the context window approaches its limit, a memory-pressure event prompts the agent to summarize or evict content. The model also yields control back to the user and resumes when new input or a system event arrives.

python
def core_memory_append(section: str, content: str):
    """Add a durable fact to an editable in-context memory block."""
    main_context.memory[section] += "\n" + content

def archival_memory_search(query: str, top_k: int = 5):
    """Retrieve relevant facts from external storage into context."""
    return vector_store.search(query, top_k=top_k)

def archival_memory_insert(content: str):
    """Write a fact or summary out to external archival storage."""
    vector_store.add(content)

def on_memory_pressure():
    """Interrupt fired when the context window is nearly full."""
    summary = llm.summarize(main_context.recent_messages)
    archival_memory_insert(summary)
    main_context.evict_oldest()
Conceptual sketch of MemGPT-style self-editing memory functions the LLM can call.
  • The LLM issues function calls to read, write, search, and page its own memory.
  • Self-editing memory lets the agent revise stored facts and reflections over time.
  • Memory-pressure interrupts trigger summarization or eviction before the window overflows.

What MemGPT Enables and Its Legacy

The paper demonstrated MemGPT on two settings the underlying context window could not handle directly: analyzing documents longer than the window, and multi-session chat where an agent remembers, reflects on, and evolves across conversations that together exceed the window.

MemGPT helped popularize agent memory as a first-class concern and evolved into the open-source Letta framework after the project was renamed from MemGPT to Letta in September 2024. Today MemGPT denotes the design pattern and Letta the framework. Its core pattern, an LLM that manages tiered memory through tool calls, now appears across many agent stacks, including memory layers that persist user facts between sessions.

  • Demonstrated on long-document analysis and persistent multi-session chat.
  • Renamed to the open-source Letta project for stateful agents in September 2024.
  • Established tool-driven tiered memory as a common agent design pattern.

Key takeaways

  • MemGPT treats the LLM like an OS, paging data between an in-context main memory and external storage.
  • The model manages its own memory through function calls instead of hard-coded paging logic.
  • Memory-pressure interrupts prompt the agent to summarize or evict content before the window overflows.
  • It enables long-document analysis and persistent multi-session chat beyond a fixed context window.
  • MemGPT became the open-source Letta framework and shaped modern agent memory design.

Frequently asked questions

MemGPT is a system that gives an LLM memory beyond its context window by treating the window like RAM and external storage like disk. The model uses tool calls to move information between them, so it can remember far more than fits in context at once.
It pages information in and out of the context window. Important facts stay in the in-context main memory, while older content is archived to external storage and searched back in when relevant. The window is finite, but total accessible memory is not.
It mirrors OS memory management: a fast small main memory, a large slow external memory, paging between them, interrupts, and an event loop. The LLM plays the role of the program and partly the kernel, managing its own memory through function calls.
RAG retrieves external documents to answer a query. MemGPT is broader: the agent actively manages a tiered memory it can read and write, edit its own stored facts, and decide when to page content in or out, not just fetch documents once per query.
Yes. The MemGPT research project was renamed to the open-source Letta framework for building stateful agents with persistent memory in September 2024. The original paper and code are publicly available, and the design has influenced many later agent memory systems.