Retrieval & Context

BM25 (Okapi BM25)

By Arpit Tripathi, Founder

BM25 is a sparse keyword ranking function that scores documents for a query using term frequency, inverse document frequency, and document length normalization. It remains a strong baseline for full-text search.

What is BM25 (Okapi BM25)?

BM25, also called Okapi BM25, is a ranking function used in information retrieval to estimate how relevant a document is to a search query based on the query's keywords. It is a sparse, lexical method: it matches exact terms rather than semantic meaning, and it scores documents using term frequency, inverse document frequency, and a correction for document length. The name comes from the Okapi system at City University London where it was developed.

BM25 is the default ranking function in many full-text search engines and is widely used as the keyword half of hybrid search systems. Despite being decades old and using no neural network, it remains a competitive baseline that newer dense retrievers are routinely measured against.

  • A sparse keyword ranking function, not a semantic model.
  • Built from term frequency, inverse document frequency, and length normalization.
  • Default scorer in many full-text engines and the lexical side of hybrid search.

The BM25 formula

For a query q containing terms t and a document D, BM25 sums a per-term score over the query terms. Each term's contribution multiplies an inverse document frequency weight by a saturating function of the term frequency in the document, adjusted for the document's length relative to the average.

score(D, q) = Σ_{t ∈ q} IDF(t) · [ tf(t, D) · (k₁ + 1) ] / [ tf(t, D) + k₁ · (1 − b + b · |D| / avgdl) ]
BM25 score: a sum over query terms of IDF times a length-normalized, saturating term-frequency factor. tf is term frequency in the document, |D| is document length, avgdl is the average document length, and k₁ and b are tuning parameters.
IDF(t) = ln( (N − n(t) + 0.5) / (n(t) + 0.5) + 1 )
Inverse document frequency for term t, where N is the total number of documents and n(t) is the number of documents containing t. Rare terms get a higher weight than common ones.

What the parameters do

The parameter k₁ controls term-frequency saturation. Because tf appears in both the numerator and denominator, the contribution of repeated occurrences of a term levels off rather than growing without bound, so the tenth occurrence adds far less than the second. Larger k₁ makes saturation slower; common default values are around 1.2 to 2.0, with 1.2 widely used.

The parameter b controls document length normalization, ranging from 0 to 1. At b equal to 1, scores are fully normalized by length so long documents are penalized for their size; at b equal to 0, length is ignored entirely. A common default is 0.75. The IDF term gives rare query words more weight than frequent ones, which prevents stopword-like terms from dominating the score.

  • k₁ controls how quickly term frequency saturates; a typical default is 1.2.
  • b controls length normalization from 0 (off) to 1 (full); a typical default is 0.75.
  • IDF down-weights common terms and boosts rare, discriminating ones.

BM25 is fast, interpretable, requires no training, and works well for queries with specific keywords, exact identifiers, names, codes, and rare terms that dense embeddings can blur. Its main limitation is vocabulary mismatch: it cannot match a query to a document that expresses the same idea with different words, because it relies on lexical overlap.

Modern retrieval systems often combine BM25 with a dense semantic retriever in a hybrid setup, then merge the two ranked lists, commonly with reciprocal rank fusion. The keyword side catches exact matches and rare terms while the semantic side catches paraphrases, and together they typically outperform either method alone.

  • Fast, training-free, interpretable, and strong on exact and rare terms.
  • Weak on vocabulary mismatch and paraphrased queries.
  • Frequently paired with dense retrieval in hybrid search via rank fusion.

Key takeaways

  • BM25 is a sparse keyword ranking function combining term frequency, inverse document frequency, and length normalization.
  • Term frequency saturates, so repeated occurrences of a word give diminishing returns, controlled by k₁.
  • The b parameter sets how strongly document length is normalized, typically 0.75.
  • IDF gives rare, discriminating terms more weight than common ones.
  • BM25 remains a strong baseline and serves as the lexical half of most hybrid search systems.

Frequently asked questions

BM25 ranks documents by relevance to a keyword query in full-text search. It scores each document from term frequency, inverse document frequency, and document length, and is the default ranking function in many search engines and the lexical part of hybrid search.
k1 controls how quickly the benefit of repeated terms saturates, with a common default near 1.2. b controls document length normalization from 0 to 1, with a common default of 0.75, where higher values penalize longer documents more.
BM25 builds on TF-IDF but adds term-frequency saturation so repeated terms have diminishing impact, and explicit document length normalization through the b parameter. These corrections usually make BM25 a stronger ranker than plain TF-IDF.
Neither is universally better. BM25 excels at exact keywords, names, and rare terms but misses paraphrases. Vector search captures meaning but can blur exact matches. Hybrid systems combine both, often fusing their results with reciprocal rank fusion.
Longer documents naturally contain more term occurrences, which would unfairly inflate their scores. Length normalization, tuned by the b parameter, discounts a document's score in proportion to how much longer it is than the average document in the collection.

Put the idea into practice

MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.

Try MemX Free