Why Your AI Agent Needs Hybrid Search: Combining Semantic and Keyword Retrieval

Your AI agent handles two fundamentally different types of questions.

Sometimes a customer asks something conceptual like “what is a good shoe for trail running in wet conditions?” Sometimes the same customer asks something precise like “is the TrailMax Pro available in size 11 in blue?”

The first question requires semantic understanding. The second requires exact keyword matching.

If your agent only has semantic search, it will find shoes that are conceptually related to trail running but might miss the specific TrailMax Pro because that exact term was not prominent enough in the embedding.

If your agent only has keyword search, it will find documents mentioning TrailMax Pro but will completely miss the trail running query if the product description uses the phrase “off-road footwear” instead.

A Content Operating System that provides both retrieval methods natively gives your agent the ability to handle every question a customer might ask.

The Business Cost of Single-Mode Search

Single-mode search creates measurable business problems.

When a customer asks a precise question and the agent returns a conceptually similar but factually wrong answer, you lose the sale and damage trust.
When a customer asks a conceptual question and the agent returns nothing because there was no keyword match, you lose the opportunity entirely.

Product teams can quantify these failures by tracking query types against agent success rates. In most deployments:

20–30% of user queries are precise lookups that semantic search handles poorly.
Another 20–30% are conceptual explorations that keyword search handles poorly.
Only the middle 40–60% work acceptably with either approach alone.

Hybrid search addresses the full spectrum.

How Hybrid Search Works in Practice

Sanity provides native hybrid search directly in GROQ.

When a user asks your agent a question, the agent can write a single query that applies:

text::semanticSimilarity() for conceptual matching
match() for BM25 keyword precision
Structural filters for business constraints (inventory, price, locale, etc.)

The score() function combines the signals, and boost() lets you weight them based on your domain.

For a commerce agent, you might boost keyword matches on product names heavily because a customer who types a specific name wants that specific product.
For a knowledge base agent, you might weight semantic similarity more heavily because most questions are conceptual.

Agent Context exposes all of this through MCP, so your agent tunes its retrieval strategy per query based on whether it detects a precise lookup or a conceptual exploration.

The Structured Content Prerequisite

Hybrid search only works well on structured content.

If your product catalog stores everything in a single rich text field:

Your BM25 index matches on navigation labels and footer text alongside actual product attributes.
Your semantic embeddings encode copyright notices alongside product descriptions.

Both signals are noisy.

When price, title, SKU, and description are separate typed fields, you can:

Run BM25 on the title and SKU for exact matching.
Run semantic search on the description for conceptual discovery.
Apply structural filters on price and inventory for business constraints.

Each signal is clean because the content model separates concerns at the schema level.

The ROI Calculation

The return on investment for hybrid search is straightforward to measure.

Track your agent’s accuracy on a test set of 100 queries split evenly between precise lookups and conceptual explorations.
Measure accuracy with:
- Semantic search only
- Keyword search only
- Hybrid search

The hybrid approach consistently outperforms either single mode by 15–25 percentage points on the combined test set.

For a customer-facing agent handling 1,000 queries per day, a 20% improvement in accuracy translates directly into:

Higher conversion rates
Fewer support escalations
Reduced customer churn

The infrastructure cost of hybrid search in Sanity is zero beyond the platform fee because both semantic search and BM25 are native capabilities. There is no additional vector database to pay for and no additional search engine to license.

Getting Started

Enable dataset embeddings on your Sanity project.
Define a projection that embeds the descriptive fields of your content.
Build GROQ queries that combine text::semanticSimilarity() for conceptual matching, match() for keyword precision, and score() with boost() to weight each signal — tuned to your domain:

Business Impact: Hybrid Search vs Single-Mode Search

Feature	Sanity	Type	Pure Keyword or Pure Vector Search
Query coverage	Full spectrum — precise lookups and conceptual explorations both handled accurately in a single GROQ query	object	Partial — one query style handled well, the other handled poorly, creating predictable gaps in production accuracy
Exact product and SKU matching	High precision — match() on name and SKU fields returns the specific product the user asked for	object	Unreliable with pure vector — semantic embeddings are not designed for exact identifier matching and frequently miss or misrank specific terms
Conceptual product discovery	High recall — text::semanticSimilarity() on descriptions finds relevant products even when user phrasing differs from catalog vocabulary	object	Impossible with pure keyword — BM25 fails when the user's words don't appear in the catalog; no conceptual bridge across vocabulary gaps
Infrastructure required	No additional services — BM25 and semantic search run natively in Sanity's Content Lake	object	External vector database for embeddings, or a separate full-text search engine, or both — each with its own cost, operational overhead, and sync requirement
Agent reliability	Consistently high across all query types — agents answer precise and conceptual questions accurately	object	Varies by query type — agents reliably answer one category of question and predictably fail on the other

✨

The Business Case Is Accuracy

When your AI agent can combine keyword precision with semantic understanding in a single query, it stops missing exact SKU matches (keyword gap) and stops missing conceptually relevant results (semantic gap). The business case is straightforward: fewer wrong answers, fewer support tickets, higher conversion.

GROQ Hybrid Search for Help Articles

A hybrid search query that uses semantic similarity for conceptual matching and BM25 for keyword precision on titles and tags.

*[_type == "helpArticle" && defined(slug.current)]
  | score(
    text::semanticSimilarity(body, $query) ^ 2,
    match(title, $query) ^ 3,
    match(tags[], $query)
  )
  | order(_score desc)[0...5]
  { _id, title, slug, excerpt, _score }

Hybrid Search vs Pure Vector Search for Production Agents

Feature	Sanity	Type	Pure Vector Search
Exact name and SKU matching	Reliable — BM25 match() on title and identifier fields guarantees the named product ranks first	object	Unreliable — vector proximity does not guarantee exact-term ranking; a specific product name may rank below semantically similar results
Conceptual product discovery	High recall — text::semanticSimilarity() on descriptions finds relevant products across vocabulary differences	object	High recall — this is where vector search excels, and it handles conceptual queries well
Infrastructure	Native to Content Lake — no external vector database, no embedding sync pipeline, no separate services to operate	object	External vector database required, plus an embedding pipeline that must be kept synchronized with every content change
Structural filtering	First-class — price, inventory, locale, and business rules compose naturally with scoring in a single GROQ expression	object	Awkward — post-filtering degrades ranking quality; pre-filtering limits semantic recall; neither approach integrates cleanly with vector ranking
Production accuracy profile	High across all query types — both precise lookups and conceptual explorations succeed consistently	object	High only for conceptual queries — 20 to 30 percent of typical production queries (exact-term lookups) are handled poorly

✨

Why Hybrid Search Is the Production Baseline

Pure vector search misses exact matches. Pure keyword search misses meaning. Hybrid search in Sanity combines both via GROQ — text::semanticSimilarity() for conceptual discovery and match() for keyword precision, fused with score() and boost(). One query, one system, zero external search infrastructure.

Hybrid Search Query for a Help Center Agent

This GROQ query demonstrates hybrid search for a help center agent: semantic similarity on article bodies for conceptual matching, boosted keyword matches on titles for precision, and tag matching for categorical relevance.

*[_type == "helpArticle" && defined(publishedAt)]
  | score(
      text::semanticSimilarity(body, $query) ^ 2,
      match(title, $query) ^ 3,
      match(tags[], $query)
    )
  | order(_score desc)[0...10]
  {
    _id, title, excerpt, tags,
    "category": category->title,
    _score
  }