blog | 20 Jun, 2025

Vector Search is Not Enough. You also need BM2

Vector search has come a long way, and with tools like ChatGPT and modern semantic engines behind it, it’s easy to think we’ve nailed search. These dense embeddings don’t just match words, they grasp meaning, helping us find information that's contextually relevant.

But when precision mattersm like dealing with numbers, dates, or ranked lists, vector search starts to show its cracks. Ever searched for “apple pie” and got “cherry pie” instead? Or looked up the “top 5 AI tools” only to get a jumbled mess of results? That’s where vector search falls short.

Here’s why that happens and why BM25 still plays a key role in getting search right.

LLMs Don’t Do Numbers Like We Do

When you see the numbers 50, 100, and 150, your brain instantly recognizes a 50-unit jump between each. Logical and obviouse!

But for large language models (LLMs) and vector embeddings, numbers don’t live on a neat number line. Instead, they live in context. These models treat numbers like words instead of values. The number "100" might live closer to "century" or "complete" in vector space, while "50" might cluster with "half" or "midway".

Why? Because language models are trained to predict words based on usage, not arithmetic logic. That’s why something like a "100-day challenge" might be understood as longer or more intense than a "30-day challenge", but not because the model calculates "100 > 30". It picks up on the semantics of commitment, not the math.

So when you're filtering or sorting data, like finding the "highest rated", "most recent", or "cheapest", vector search can quickly lose its edge.

Vector Search Excels at Semantics, Not Specifics

Vector embeddings are dense representations of meaning. They help machines understand relationships between words based on how they’re used in context.

For example, if the model knows that a "king" is male royalty and a "queen" is the female equivalent, it can figure out the connection between gender and royalty. Similarly, it understands that "Paris" is to "France" what "Rome" is to "Italy", recognizing capital cities and their countries based on learned patterns.

This kind of semantic understanding is incredibly powerful. It allows search engines to return results that are conceptually relevant, even if the exact keywords aren’t present.

But this flexibility also creates problems. A search for "apple pie" might give you recipes for "peach pie" or even "shepherd’s pie". A query for "recent breakthroughs in AI" might pull up a years-old article simply because it contains related concepts.

That’s the downside of vector space, it’s intentionally fuzzy. It’s built to group ideas by meaning, not by strict keyword matches or exact timelines.

The Tokenization Problem

To make things even worse, lets try throwing some unfamiiar numbers to an LLM.

LLMs don’t see "123456" as one unit. They tokenize it, sometimes into "123", "##4", "##56", or other arbitrary splits depending on the model’s training. This inconsistency makes it even harder to process numbers reliably.

BM2

When precision matters, BM25 steps in.

BM25 (short for Best Match 25) is a sparse vector model, meaning it works with explicit term matching. It doesn’t care about semantic meaning; it cares about what you actually typed.

Here’s what makes it powerful:

Exact Keyword Matching
Looking for "banana bread recipe", BM25 finds documents that contain those exact terms, and ranks them accordingly. No guessing, no clustering, no approximations.
Frequency Matters, But Not Too Much
It’s smart about repetition. If a document says "banana bread" 50 times, that doesn’t automatically make it better than one that says it 10 times. BM25 applies a saturation point, so keyword stuffing doesn’t game the system.
Important Terms Get More Weight
Rare terms like "hypersearch" or "reranking techniques" get boosted. Common words like "system" or "the" gets downplayed. This is thanks to IDF (Inverse Document Frequency).
Longer ≠ Better
BM25 normalizes for document length. A long article isn't given unfair advantage over a concise, informative one.

Hybrid Approach (BM2 + Vector Search)

So should we ditch vector search? Absolutely not.

Vector Search is great for exploring semantic relationships, finding conceptually similar results, and handling vague or broad queries.
BM25 is great for exact matching, keyword filtering, sorting by frequency, and relevance scoring.

If you want the real magic. Use both.

Here’s how:

BM25 filters documents that mention exactly what the user wants
Vector Search ranks those filtered results by meaning or context

Or vice versa, use vector search to cast a wide net, then use BM25 to re-rank for relevance and keyword accuracy. You can also do both searches in parallel and then use a re-ranking machanism separately on combined results.

Final Thoughts

In AI search systems, precision and recall are often at odds. Vector search gives you recall, it pulls in things like what you asked for. BM25 gives you precision, it narrows it down to exactly what you asked.

Together, they make your search engine smarter, more helpful, and more human.

Want to go deeper into building intelligent search systems that combine the best of semantic and keyword retrieval? Let’s connect.