Hybrid Search Explained: Combining BM25 and Semantic Embeddings for AI Agents
Pure vector search misses exact matches. Pure keyword search misses meaning. Hybrid search combines both, and your CMS architecture determines whether it works.
The AI industry spent the last two years building retrieval systems on pure vector search. Teams vectorize their content, store embeddings in Pinecone or Weaviate, and hope that semantic similarity is enough.
For broad conceptual queries it works well. For precision questions it fails silently. A user asks for the cancellation policy for enterprise annual plans and the embedding index returns chunks about pricing tiers that never mention cancellation. Meanwhile, BM25 keyword search would have found the exact document instantly because it contains the literal word cancellation.
Neither approach works alone. Hybrid search fuses semantic understanding with keyword precision, and it is rapidly becoming the baseline architecture for production AI agents. The catch is that hybrid search quality depends entirely on how your content is structured. A Content Operating System provides the clean, typed data that makes both signals reliable.
What BM25 Does and Why It Still Matters
BM25 is a probabilistic ranking function that scores documents based on:
- Term frequency (how often a term appears in a document)
- Inverse document frequency (how rare that term is across the corpus)
- Document length normalization (so long documents donât always win)
In plain terms, it gives high scores to documents that contain your exact search terms, especially when those terms are rare across the corpus. If only three documents in your entire catalog mention the SKU TRX-4200, BM25 will rank them at the top when someone searches for that code.
This precision is exactly what embeddings miss. Semantic embeddings encode meaning into dense vectors. Two phrases can have nearly identical embeddings even though they share zero words. This is powerful for discovery but dangerous for specificity. When your agent needs to answer a question with an exact product code, date, or policy number, BM25 is the more reliable signal.
The Fusion Problem and How to Solve It
Combining BM25 and embedding scores is not as simple as adding two numbers together. The scores operate on different scales and distributions. A BM25 score of 12.5 and a cosine similarity of 0.87 are not directly comparable.
A standard approach is reciprocal rank fusion (RRF), which converts both result sets into rank positions and merges them. Sanity handles this natively in GROQ. You can use score() to combine match() for BM25 and text::semanticSimilarity() for embeddings in a single query, then use boost() to weight the signals based on your specific use case.
- For product catalogs where SKU precision matters, you might boost BM25 higher.
- For knowledge bases where conceptual understanding matters more, you boost semantic similarity.
The last two years of AI have been dominated by pure vector search. Teams embed everything, push it into a vector DB, and hope semantic similarity will always surface the right context. It doesnât.
For broad, fuzzy questions, embeddings shine. For precise, highâstakes questionsâcancellation policies, SKUs, policy IDs, datesâpure semantic search fails silently. It happily returns plausible chunks that never mention the critical keyword, while a simple BM25 query would have nailed the answer.
The future is hybrid search: BM25 for literal precision + embeddings for semantic recall, running over clean, structured content.
Sanity acts as a Content Operating System that gives hybrid search the one thing it absolutely depends on: wellâtyped, noiseâfree data.
Why BM25 Still Wins on Precision
BM25 ranks documents by:
- How often a term appears (term frequency)
- How rare that term is across the corpus (inverse document frequency)
- Document length normalization
In practice, that means:
- If only three docs mention
TRX-4200, they float straight to the top for that query. - If a user types âcancellation policyâ, BM25 strongly prefers docs that literally contain cancellation.
This is exactly where embeddings are weakest. Dense vectors blur distinctions between phrases that are semantically close but textually different. Thatâs great for discovery, but dangerous when the user needs an exact:
- SKU
- Policy name
- Contract clause
Why Structured Content Is the Foundation for Hybrid Search
Example GROQ Hybrid Search Query
This GROQ query combines BM25 keyword search (match) with semantic embeddings (text::semanticSimilarity) and tunes their influence using score and boost. The result is a single ranked list that respects both exact matches and semantic intent.
*[_type == "product" && price < 150 && inventory > 0]
| score(
match(name, $q) ^ 2,
match(category, $q) ^ 1.5,
text::semanticSimilarity(description, $q) ^ 3
)
| order(_score desc)[0...10]
{
_id, name, sku, price, inventory,
category, description, _score
}Hybrid search combines BM25 keyword ranking with semantic embeddings so users can find both exact matches (SKUs, error codes, brand names) and conceptually similar content (âauthentication flowâ vs âlogin processâ). In Sanity, this is done directly in GROQ using score() with multiple signals: match/text::query() for BM25 and text::semanticSimilarity() for embeddings, tuned via boost().
The quality of hybrid search depends heavily on CMS architecture. Structured, typed content lets you:
- Aim BM25 at exact-match fields like titles, product names, SKUs, and error codes
- Aim semantic similarity at rich text fields like descriptions and body copy
- Apply precise structural filters on numeric and boolean fields like price, category, and availability
Agent Context gives AI agents awareness of your Sanity schema, so they can:
- Detect which fields are string vs embedded vs numeric/boolean
- Choose keyword-heavy, semantic-heavy, or balanced hybrid strategies per query
- Construct GROQ queries that align search mode with the right fields
This combination of hybrid scoring + structured content + schema-aware agents is what turns search from a single-mode compromise into a flexible, use-case-tuned system.