Retrieval & Context

HNSW (Hierarchical Navigable Small World)

By Arpit Tripathi, Founder

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search over high-dimensional vectors. It builds a multi-layer proximity graph and uses greedy routing from a sparse top layer down to a dense base layer, giving sublinear search time with high recall, which makes it a default index in most vector databases.

What is HNSW?

HNSW, short for Hierarchical Navigable Small World, is an algorithm for approximate nearest neighbor (ANN) search over high-dimensional vectors such as text or image embeddings. It was introduced by Yury Malkov and Dmitry Yashunin in their 2016 paper on efficient approximate nearest neighbor search using hierarchical navigable small world graphs. Rather than comparing a query against every stored vector, HNSW navigates a graph to reach close neighbors in a small number of hops.

It is called approximate because it trades a small, tunable amount of recall for a large speedup. Exact nearest neighbor search scales linearly with the number of vectors, which is too slow for millions or billions of items. HNSW achieves roughly logarithmic search complexity while returning the true nearest neighbors with high probability.

Because it offers strong recall, fast queries, and incremental insertion, HNSW is the default or primary index type in many vector databases and libraries, including FAISS, pgvector, Qdrant, Weaviate, and Milvus. It is the algorithm behind most production semantic search and RAG retrieval.

  • A graph-based algorithm for approximate nearest neighbor search over high-dimensional vectors.
  • Introduced by Malkov and Yashunin in 2016.
  • Trades a small, tunable amount of recall for roughly logarithmic search time.
  • Supports incremental insertion of new vectors without a full rebuild.
  • The default index in many vector databases: FAISS, pgvector, Qdrant, Weaviate, Milvus.

How does the HNSW graph work?

HNSW combines two older ideas. The first is the probability skip list, where multiple layers of linked lists let a search take long jumps at the top and short, precise steps at the bottom. The second is the navigable small world graph, a proximity graph in which each vertex keeps a list of nearby neighbors, and search proceeds by greedy routing: repeatedly move to the neighbor closest to the query until no neighbor is closer.

HNSW arranges vectors into a hierarchy of layers. The top layer is sparse and contains only a few vertices connected by long-range links. Each lower layer contains more vertices and shorter-range links, and the base layer (layer 0) contains every vector. When a node is inserted, its maximum layer is chosen randomly with an exponentially decaying probability, so most nodes live only on the bottom layer.

A search starts at the single entry point on the top layer and greedily walks toward the query until it reaches a local minimum, then drops to the next layer down and repeats. The sparse upper layers act as an express network that carries the search most of the way across the space in a few hops, while the dense lower layers refine the result. This layered descent is what avoids the early-stopping problem of a single flat small-world graph.

  • Built from two ideas: probability skip lists and navigable small world graphs.
  • Search uses greedy routing: move to the closest neighbor until none is closer.
  • Layers form a hierarchy: sparse long-range links on top, dense short-range at the base.
  • A node's top layer is chosen randomly with exponentially decaying probability.
  • Search descends from a top entry point through layers, refining at each level.

The three parameters: M, efConstruction, efSearch

M sets how many bidirectional links each node keeps per layer (the base layer typically allows up to 2M). A higher M produces a denser graph that reaches higher recall and handles harder, higher-dimensional data, but it increases memory use and build time. Memory grows substantially with M, so typical values sit in the range of about 12 to 48, with 16 a common default.

efConstruction controls how many candidate neighbors the builder considers when inserting each node. A larger value produces a higher-quality graph and better recall at query time, at the cost of slower index construction. It is a build-time setting and does not affect query latency directly. Common values run from around 100 to several hundred.

efSearch (sometimes just ef) sets the size of the dynamic candidate list explored during a query. It is the main runtime dial for the recall-versus-speed tradeoff: raising it explores more of the graph and lifts recall toward 100 percent, but increases query latency. It can be tuned per query, so you can run fast, lower-recall searches and slow, high-recall searches against the same index. efSearch must be at least as large as the number of results requested (k).

  • M: links per node per layer; higher means better recall but more memory and build time (common default ~16).
  • efConstruction: build-time candidate breadth; higher means a better graph and recall but slower indexing.
  • efSearch: query-time candidate breadth; the main recall-versus-latency dial, tunable per query.
  • Base layer typically allows up to 2M links per node.
  • efSearch must be greater than or equal to k, the number of neighbors requested.

Code: building and querying an HNSW index

The example below uses the widely used hnswlib library to build an index, set the parameters described above, and query it. Note that efConstruction and M are set at build time, while ef (efSearch) is set before querying and can be changed at any time.

python
import hnswlib
import numpy as np

dim = 768
num_elements = 100_000
data = np.float32(np.random.random((num_elements, dim)))
ids = np.arange(num_elements)

# Cosine similarity index over 768-dim vectors
index = hnswlib.Index(space="cosine", dim=dim)

# Build-time parameters
index.init_index(max_elements=num_elements, ef_construction=200, M=16)
index.add_items(data, ids)

# Query-time parameter (recall vs. speed); must be >= k
index.set_ef(64)

query = np.float32(np.random.random((1, dim)))
labels, distances = index.knn_query(query, k=10)
print(labels, distances)

# Raise ef for higher recall on a harder query
index.set_ef(256)
labels, distances = index.knn_query(query, k=10)
Build and query an HNSW index with hnswlib, tuning M, efConstruction, and ef.

Strengths, costs, and when to use it

HNSW's strengths are high recall at low query latency, support for incremental inserts, and good behavior on high-dimensional data, which is why it backs so much production semantic search. The main cost is memory: the graph stores many links per vector and is typically held in RAM, so a billion-scale index can be expensive. Build time also grows with M and efConstruction.

For very large or memory-constrained corpora, HNSW is often combined with quantization (for example product quantization or scalar quantization) to shrink vector storage, or with disk-based variants that keep part of the graph off RAM. A common pattern is HNSW for the in-memory candidate stage followed by a reranking step for final precision.

Choose HNSW when you need fast, high-recall vector search and can afford the memory, which describes most RAG and semantic search workloads. If memory is the binding constraint and you can tolerate longer queries or lower recall, alternatives such as IVF-based indexes or disk-first ANN may fit better.

  • Strengths: high recall, low latency, incremental inserts, good on high dimensions.
  • Main cost is RAM, since the link-heavy graph is usually held in memory.
  • Pair with quantization or disk-based variants for very large or memory-tight corpora.
  • Often used as the candidate stage before a reranking step for final precision.
  • Use it when you need fast, high-recall vector search and can afford the memory.

Key takeaways

  • HNSW is a graph-based approximate nearest neighbor algorithm that gives roughly logarithmic search time with high recall, making it the default index in most vector databases.
  • It combines skip lists and navigable small world graphs into a layered hierarchy, searching greedily from a sparse top layer down to a dense base layer.
  • M controls graph density (recall versus memory and build time), efConstruction controls build-time graph quality, and efSearch is the runtime recall-versus-speed dial.
  • efSearch is tunable per query and must be at least k, so the same index can serve fast low-recall and slow high-recall searches.
  • Its main cost is RAM, so very large corpora often pair HNSW with quantization or disk-based variants and a reranking stage.

Frequently asked questions

HNSW is used for approximate nearest neighbor search over high-dimensional embeddings, the core operation behind semantic search and RAG retrieval. It finds vectors close to a query in roughly logarithmic time instead of scanning every vector, and it is the default index in databases like Qdrant, Weaviate, Milvus, and pgvector.
M sets links per node, trading recall against memory and build time. efConstruction sets build-time candidate breadth, trading graph quality against indexing speed. efSearch sets query-time candidate breadth, the main runtime dial for recall versus latency, and it can be changed per query.
It delivers high recall at low query latency, supports incremental inserts without rebuilding, and performs well on high-dimensional data. These properties match production semantic search and RAG needs, so most vector databases ship HNSW as a default or primary index type.
Memory. The graph stores many links per vector and is usually held in RAM, so billion-scale indexes are costly. Build time also rises with M and efConstruction. Large or memory-constrained deployments often pair HNSW with quantization or disk-based variants.
Raise efSearch at query time to explore more of the graph, increase M for a denser graph, and raise efConstruction for a higher-quality build. efSearch is the cheapest dial since it is per-query, but it increases latency. The others increase memory or build time.