Vector Similarity Metrics (Cosine vs Dot Product vs Euclidean)
By Arpit Tripathi, Founder
Vector similarity metrics measure how close two embeddings are. Cosine similarity compares direction (angle), dot product combines direction and magnitude, and Euclidean distance measures straight-line distance. The right choice depends on whether embeddings are normalized.
What are Vector Similarity Metrics?
Vector similarity metrics are mathematical functions that quantify how similar two vectors, typically embeddings, are to each other. In semantic search and retrieval-augmented generation, text, images, and other data are converted into high-dimensional embeddings, and a similarity metric decides which stored vectors are closest to a query vector. The choice of metric determines what closeness means and directly affects retrieval quality.
The three most common metrics are cosine similarity, dot product (inner product), and Euclidean distance. They capture different geometric notions: angle, angle combined with magnitude, and straight-line distance. For normalized vectors they rank results identically, but on raw vectors they can behave very differently.
- Similarity metrics score how close two embeddings are in vector space.
- They power semantic search, RAG retrieval, and recommendation ranking.
- Cosine, dot product, and Euclidean are the three standard choices.
Cosine Similarity
Cosine similarity measures the cosine of the angle between two vectors, ignoring their magnitudes. It ranges from -1 (opposite direction) through 0 (orthogonal) to 1 (same direction). Because it focuses only on direction, it is the most common default for text embeddings, where what matters is semantic orientation rather than vector length, which can vary with document length or token count.
Cosine similarity is computed as the dot product of the two vectors divided by the product of their magnitudes. This division is exactly what removes magnitude from the comparison. One caveat worth knowing: research by Steck et al. shows that on some learned embeddings, cosine similarity can be implicitly shaped by regularization and yield arbitrary results, so it is a sensible default rather than a guaranteed-best choice.
- Compares direction (angle), not magnitude.
- Ranges from -1 to 1; higher means more similar.
- A common default for text embeddings, though not universally optimal.
Dot Product and Euclidean Distance
Dot product (inner product) multiplies vectors element-wise and sums the result. It reflects both the angle between vectors and their magnitudes, so longer vectors can score higher. Many embedding models are trained so that the dot product is the intended similarity, and it is cheaper to compute than cosine because it skips the normalization step. On vectors that are already unit-normalized, dot product and cosine similarity are mathematically equivalent.
Euclidean distance (the L2 distance) measures the straight-line distance between the endpoints of two vectors. Unlike the others it is a distance, so smaller means more similar, with 0 meaning identical. It is sensitive to magnitude and is common when absolute position in the space carries meaning, such as some image embeddings or clustering tasks.
- Dot product captures direction and magnitude; larger means more similar.
- Euclidean distance is a straight-line distance; smaller means more similar.
- On unit-normalized vectors, dot product equals cosine similarity and ranks the same as Euclidean.
Choosing and Computing a Metric
The practical rule: if your embeddings are normalized to unit length, cosine, dot product, and Euclidean produce the same ranking, so pick the cheapest your index supports (often dot product). If vectors are not normalized and magnitude is noise, use cosine. If magnitude carries real meaning, dot product or Euclidean may be appropriate. Always match the metric to the one the embedding model was trained with, which the model card usually states.
Vector databases let you select the metric at index creation, and approximate-nearest-neighbor indexes support all three. The snippet below computes the three metrics on the same pair of vectors with NumPy.
- Normalized vectors: all three rank identically, so choose the cheapest.
- Use cosine when magnitude is noise; use dot product or Euclidean when magnitude matters.
- Match the metric to what the embedding model was trained on.
Key takeaways
- Cosine similarity compares direction (angle) and ignores magnitude; it is a common default for text embeddings.
- Dot product combines direction and magnitude and is cheaper to compute than cosine.
- Euclidean distance is a straight-line distance where smaller values mean more similar.
- On unit-normalized vectors, all three metrics produce the same ranking.
- Always match the metric to the one the embedding model was trained with.
Frequently asked questions
Related terms
Related reading
Sources
Put the idea into practice
MemX is an AI memory app built on these ideas: store anything, skip the folders, and find it again by asking in plain English.
Try MemX Free