Retrieval & Context

HyDE (Hypothetical Document Embeddings)

HyDE is a retrieval technique that asks a language model to write a hypothetical answer to a query, then embeds that answer and uses its vector to find real documents, improving zero-shot dense retrieval.

What is HyDE (Hypothetical Document Embeddings)?

HyDE, short for Hypothetical Document Embeddings, is a dense retrieval technique that improves search by first generating a hypothetical answer to the user's query and then embedding that answer instead of the query itself. The vector of the generated document is used to find real documents nearby in the corpus embedding space. It was introduced in the 2022 paper Precise Zero-Shot Dense Retrieval without Relevance Labels.

The motivation is a mismatch problem. A short question and a relevant passage often look different as text, so the question embedding may sit far from the answer it needs. A hypothetical answer, even if factually wrong, resembles a real answer in structure, vocabulary, and topic, so its embedding lands closer to genuinely relevant documents.

Embed a generated hypothetical answer, not the raw query.
Designed for zero-shot retrieval without labeled relevance data.
Addresses the vocabulary and structure gap between questions and answers.

How HyDE works

HyDE runs in two stages. First, an instruction-following language model is prompted to write a passage that would answer the query, producing a hypothetical document. This document captures the patterns of a relevant answer but is not grounded in the corpus and may contain invented facts. Second, an unsupervised contrastively learned encoder, such as Contriever, encodes the hypothetical document into an embedding vector.

That vector identifies a neighborhood in the corpus embedding space, and the system retrieves the real documents whose embeddings are most similar to it. The encoder's dense bottleneck acts as a lossy filter: it preserves the broad semantic content of the hypothetical answer while discarding the specific false details, so the retrieved documents are real and on-topic. The hallucinated specifics never reach the final results because only existing corpus documents are returned.

Step 1: an LLM writes a hypothetical answer to the query.
Step 2: an encoder embeds that answer into a single vector.
Step 3: nearest real documents to that vector are retrieved.
The encoder filters out invented details while keeping semantic intent.

Why it improves retrieval

Because the hypothetical document mirrors the form of an actual answer, its embedding is a better query than the original question for similarity search. In the original work, HyDE outperformed the strong unsupervised dense retriever Contriever and reached performance comparable to fine-tuned retrievers across several tasks and languages, all without any relevance labels.

The approach is most useful in zero-shot settings, where no labeled query-document pairs exist to train a retriever, and for queries that are short, ambiguous, or phrased very differently from the source text. It is a query transformation step that sits in front of an otherwise standard dense retrieval or RAG pipeline.

A hypothetical answer is a stronger search vector than a bare question.
Strong zero-shot results without labeled relevance data.
Helps most with short, ambiguous, or vocabulary-mismatched queries.

Code example

The sketch below shows the HyDE flow: generate a hypothetical document with an LLM, embed it, and search a vector store with the resulting vector.

python

from openai import OpenAI

client = OpenAI()

def hyde_search(query, vector_store, top_k=5):
    # Step 1: generate a hypothetical answer document
    gen = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f"Write a short factual passage that answers: {query}"
        }]
    )
    hypothetical_doc = gen.choices[0].message.content

    # Step 2: embed the hypothetical document (not the raw query)
    emb = client.embeddings.create(
        model="text-embedding-3-small",
        input=hypothetical_doc
    ).data[0].embedding

    # Step 3: retrieve real documents nearest to that vector
    return vector_store.similarity_search_by_vector(emb, k=top_k)

A minimal HyDE retrieval flow: generate, embed, then search.

Trade-offs

HyDE adds an LLM generation call before every retrieval, which increases latency and cost compared with embedding the query directly. The quality of retrieval depends on the generator producing a plausible answer, so for queries the model knows little about, the hypothetical document can drift off-topic and hurt results.

When a corpus is well matched to user queries or when a retriever has been fine-tuned on in-domain data, the benefit of HyDE shrinks. It is most valuable as a zero-shot or cold-start technique, and it can be combined with reranking or hybrid search to compensate for cases where the generated document is weak.

Each query incurs an extra LLM call, raising latency and cost.
Off-topic hypothetical answers can degrade retrieval.
Benefit shrinks with fine-tuned retrievers or well-matched corpora.

Key takeaways

HyDE embeds a generated hypothetical answer instead of the raw query to retrieve real documents.
It runs in two stages: an LLM writes a hypothetical answer, then an encoder embeds it for similarity search.
The encoder's bottleneck filters out hallucinated details while keeping the answer's semantic intent.
In the original paper it beat unsupervised Contriever and rivaled fine-tuned retrievers without relevance labels.
It adds an LLM call per query and helps most in zero-shot or cold-start retrieval.

Frequently asked questions

HyDE is a query transformation step where a language model writes a hypothetical answer to the query, that answer is embedded, and the resulting vector is used to retrieve real documents. It improves dense retrieval, especially in zero-shot settings without labeled data.

A short question often looks different from a relevant passage, so their embeddings can be far apart. A hypothetical answer resembles a real answer in structure and vocabulary, so its embedding lands closer to genuinely relevant documents in the corpus.

No. The hypothetical document is only used to compute a search vector. The encoder's dense bottleneck discards invented details, and only real corpus documents nearest to that vector are retrieved and returned.

Use HyDE for zero-shot or cold-start retrieval, or for short and ambiguous queries that are worded very differently from source documents. Its benefit shrinks when you have a fine-tuned in-domain retriever or a corpus already well matched to queries.

HyDE adds an LLM generation call before every search, increasing latency and cost. If the model produces an off-topic hypothetical answer, retrieval quality can drop, so it is often paired with reranking or hybrid search.

Put the idea into practice

MemX is an AI memory agent built on these ideas: store anything, skip the folders, and find it again by asking in plain English.

Try MemX Free