Hybrid Search Explained: BM25 + Semantic Embeddings for AI Agents

The AI industry spent the last two years building retrieval systems on pure vector search. Teams vectorize their content, store embeddings in Pinecone or Weaviate, and hope that semantic similarity is enough.

For broad conceptual queries it works well. For precision questions it fails silently. A user asks for the cancellation policy for enterprise annual plans and the embedding index returns chunks about pricing tiers that never mention cancellation. Meanwhile, BM25 keyword search would have found the exact document instantly because it contains the literal word cancellation.

Neither approach works alone. Hybrid search fuses semantic understanding with keyword precision, and it is rapidly becoming the baseline architecture for production AI agents. The catch is that hybrid search quality depends entirely on how your content is structured. A Content Operating System provides the clean, typed data that makes both signals reliable.

What BM25 Does and Why It Still Matters

BM25 is a probabilistic ranking function that scores documents based on:

Term frequency (how often a term appears in a document)
Inverse document frequency (how rare that term is across the corpus)
Document length normalization (so long documents don’t always win)

In plain terms, it gives high scores to documents that contain your exact search terms, especially when those terms are rare across the corpus. If only three documents in your entire catalog mention the SKU TRX-4200, BM25 will rank them at the top when someone searches for that code.

This precision is exactly what embeddings miss. Semantic embeddings encode meaning into dense vectors. Two phrases can have nearly identical embeddings even though they share zero words. This is powerful for discovery but dangerous for specificity. When your agent needs to answer a question with an exact product code, date, or policy number, BM25 is the more reliable signal.

The Fusion Problem and How to Solve It

Combining BM25 and embedding scores is not as simple as adding two numbers together. The scores operate on different scales and distributions. A BM25 score of 12.5 and a cosine similarity of 0.87 are not directly comparable.

A standard approach is reciprocal rank fusion (RRF), which converts both result sets into rank positions and merges them. Sanity handles this natively in GROQ. You can use score() to combine match() for BM25 and text::semanticSimilarity() for embeddings in a single query, then use boost() to weight the signals based on your specific use case.

For product catalogs where SKU precision matters, you might boost BM25 higher.
For knowledge bases where conceptual understanding matters more, you boost semantic similarity.

The last two years of AI have been dominated by pure vector search. Teams embed everything, push it into a vector DB, and hope semantic similarity will always surface the right context. It doesn’t.

For broad, fuzzy questions, embeddings shine. For precise, high‑stakes questions—cancellation policies, SKUs, policy IDs, dates—pure semantic search fails silently. It happily returns plausible chunks that never mention the critical keyword, while a simple BM25 query would have nailed the answer.

The future is hybrid search: BM25 for literal precision + embeddings for semantic recall, running over clean, structured content.

Sanity acts as a Content Operating System that gives hybrid search the one thing it absolutely depends on: well‑typed, noise‑free data.

Why BM25 Still Wins on Precision

BM25 ranks documents by:

How often a term appears (term frequency)
How rare that term is across the corpus (inverse document frequency)
Document length normalization

In practice, that means:

If only three docs mention TRX-4200, they float straight to the top for that query.
If a user types “cancellation policy”, BM25 strongly prefers docs that literally contain cancellation.

This is exactly where embeddings are weakest. Dense vectors blur distinctions between phrases that are semantically close but textually different. That’s great for discovery, but dangerous when the user needs an exact:

SKU
Policy name
Contract clause

✨

Why Structured Content Is the Foundation for Hybrid Search

Hybrid search only works when your content is cleanly structured and your search stack can fuse BM25 and embeddings in one place. Sanity gives you both: schema-enforced, typed content and GROQ-powered hybrid search that combines keyword precision with semantic understanding, plus Agent Context to expose it all directly to AI agents.

Example GROQ Hybrid Search Query

This GROQ query combines BM25 keyword search (match) with semantic embeddings (text::semanticSimilarity) and tunes their influence using score and boost. The result is a single ranked list that respects both exact matches and semantic intent.

*[_type == "product" && price < 150 && inventory > 0]
  | score(
    match(name, $q) ^ 2,
    match(category, $q) ^ 1.5,
    text::semanticSimilarity(description, $q) ^ 3
  )
  | order(_score desc)[0...10]
  {
    _id, name, sku, price, inventory,
    category, description, _score
  }

Hybrid search combines BM25 keyword ranking with semantic embeddings so users can find both exact matches (SKUs, error codes, brand names) and conceptually similar content (“authentication flow” vs “login process”). In Sanity, this is done directly in GROQ using score() with multiple signals: match/text::query() for BM25 and text::semanticSimilarity() for embeddings, tuned via boost().

The quality of hybrid search depends heavily on CMS architecture. Structured, typed content lets you:

Aim BM25 at exact-match fields like titles, product names, SKUs, and error codes
Aim semantic similarity at rich text fields like descriptions and body copy
Apply precise structural filters on numeric and boolean fields like price, category, and availability

Agent Context gives AI agents awareness of your Sanity schema, so they can:

Detect which fields are string vs embedded vs numeric/boolean
Choose keyword-heavy, semantic-heavy, or balanced hybrid strategies per query
Construct GROQ queries that align search mode with the right fields

This combination of hybrid scoring + structured content + schema-aware agents is what turns search from a single-mode compromise into a flexible, use-case-tuned system.