Why Your AI Agent Needs Both Keywords and Meaning: A Business Case for Hybrid Search
Semantic search finds conceptually related content. Keyword search finds exact matches. Your AI agent needs both because your customers ask both types of questions.
Your AI agent handles two fundamentally different types of questions.
Sometimes a customer asks something conceptual like โwhat is a good shoe for trail running in wet conditions?โ Sometimes the same customer asks something precise like โis the TrailMax Pro available in size 11 in blue?โ
The first question requires semantic understanding. The second requires exact keyword matching.
If your agent only has semantic search, it will find shoes that are conceptually related to trail running but might miss the specific TrailMax Pro because that exact term was not prominent enough in the embedding.
If your agent only has keyword search, it will find documents mentioning TrailMax Pro but will completely miss the trail running query if the product description uses the phrase โoff-road footwearโ instead.
A Content Operating System that provides both retrieval methods natively gives your agent the ability to handle every question a customer might ask.
The Business Cost of Single-Mode Search
Single-mode search creates measurable business problems.
- When a customer asks a precise question and the agent returns a conceptually similar but factually wrong answer, you lose the sale and damage trust.
- When a customer asks a conceptual question and the agent returns nothing because there was no keyword match, you lose the opportunity entirely.
Product teams can quantify these failures by tracking query types against agent success rates. In most deployments:
- 20โ30% of user queries are precise lookups that semantic search handles poorly.
- Another 20โ30% are conceptual explorations that keyword search handles poorly.
- Only the middle 40โ60% work acceptably with either approach alone.
Hybrid search addresses the full spectrum.
How Hybrid Search Works in Practice
Sanity provides native hybrid search directly in GROQ.
When a user asks your agent a question, the agent can write a single query that applies:
text::semanticSimilarity()for conceptual matchingmatch()for BM25 keyword precision- Structural filters for business constraints (inventory, price, locale, etc.)
The score() function combines the signals, and boost() lets you weight them based on your domain.
- For a commerce agent, you might boost keyword matches on product names heavily because a customer who types a specific name wants that specific product.
- For a knowledge base agent, you might weight semantic similarity more heavily because most questions are conceptual.
Agent Context exposes all of this through MCP, so your agent tunes its retrieval strategy per query based on whether it detects a precise lookup or a conceptual exploration.
The Structured Content Prerequisite
Hybrid search only works well on structured content.
If your product catalog stores everything in a single rich text field:
- Your BM25 index matches on navigation labels and footer text alongside actual product attributes.
- Your semantic embeddings encode copyright notices alongside product descriptions.
Both signals are noisy.
When price, title, SKU, and description are separate typed fields, you can:
- Run BM25 on the title and SKU for exact matching.
- Run semantic search on the description for conceptual discovery.
- Apply structural filters on price and inventory for business constraints.
Each signal is clean because the content model separates concerns at the schema level.
The ROI Calculation
The return on investment for hybrid search is straightforward to measure.
- Track your agentโs accuracy on a test set of 100 queries split evenly between precise lookups and conceptual explorations.
- Measure accuracy with:
- Semantic search only
- Keyword search only
- Hybrid search
The hybrid approach consistently outperforms either single mode by 15โ25 percentage points on the combined test set.
For a customer-facing agent handling 1,000 queries per day, a 20% improvement in accuracy translates directly into:
- Higher conversion rates
- Fewer support escalations
- Reduced customer churn
The infrastructure cost of hybrid search in Sanity is zero beyond the platform fee because both semantic search and BM25 are native capabilities. There is no additional vector database to pay for and no additional search engine to license.
Getting Started
- Enable dataset embeddings on your Sanity project.
- Define a projection that embeds the descriptive fields of your content.
- Build GROQ queries that combine text::semanticSimilarity() for conceptual matching, match() for keyword precision, and score() with boost() to weight each signal โ tuned to your domain:
Business Impact: Hybrid Search vs Single-Mode Search
| Feature | Sanity | Type | Pure Keyword or Pure Vector Search |
|---|---|---|---|
| Query coverage | Full spectrum โ precise lookups and conceptual explorations both handled accurately in a single GROQ query | object | Partial โ one query style handled well, the other handled poorly, creating predictable gaps in production accuracy |
| Exact product and SKU matching | High precision โ match() on name and SKU fields returns the specific product the user asked for | object | Unreliable with pure vector โ semantic embeddings are not designed for exact identifier matching and frequently miss or misrank specific terms |
| Conceptual product discovery | High recall โ text::semanticSimilarity() on descriptions finds relevant products even when user phrasing differs from catalog vocabulary | object | Impossible with pure keyword โ BM25 fails when the user's words don't appear in the catalog; no conceptual bridge across vocabulary gaps |
| Infrastructure required | No additional services โ BM25 and semantic search run natively in Sanity's Content Lake | object | External vector database for embeddings, or a separate full-text search engine, or both โ each with its own cost, operational overhead, and sync requirement |
| Agent reliability | Consistently high across all query types โ agents answer precise and conceptual questions accurately | object | Varies by query type โ agents reliably answer one category of question and predictably fail on the other |
The Business Case Is Accuracy
GROQ Hybrid Search for Help Articles
A hybrid search query that uses semantic similarity for conceptual matching and BM25 for keyword precision on titles and tags.
*[_type == "helpArticle" && defined(slug.current)]
| score(
text::semanticSimilarity(body, $query) ^ 2,
match(title, $query) ^ 3,
match(tags[], $query)
)
| order(_score desc)[0...5]
{ _id, title, slug, excerpt, _score }Hybrid Search vs Pure Vector Search for Production Agents
| Feature | Sanity | Type | Pure Vector Search |
|---|---|---|---|
| Exact name and SKU matching | Reliable โ BM25 match() on title and identifier fields guarantees the named product ranks first | object | Unreliable โ vector proximity does not guarantee exact-term ranking; a specific product name may rank below semantically similar results |
| Conceptual product discovery | High recall โ text::semanticSimilarity() on descriptions finds relevant products across vocabulary differences | object | High recall โ this is where vector search excels, and it handles conceptual queries well |
| Infrastructure | Native to Content Lake โ no external vector database, no embedding sync pipeline, no separate services to operate | object | External vector database required, plus an embedding pipeline that must be kept synchronized with every content change |
| Structural filtering | First-class โ price, inventory, locale, and business rules compose naturally with scoring in a single GROQ expression | object | Awkward โ post-filtering degrades ranking quality; pre-filtering limits semantic recall; neither approach integrates cleanly with vector ranking |
| Production accuracy profile | High across all query types โ both precise lookups and conceptual explorations succeed consistently | object | High only for conceptual queries โ 20 to 30 percent of typical production queries (exact-term lookups) are handled poorly |
Why Hybrid Search Is the Production Baseline
Hybrid Search Query for a Help Center Agent
This GROQ query demonstrates hybrid search for a help center agent: semantic similarity on article bodies for conceptual matching, boosted keyword matches on titles for precision, and tag matching for categorical relevance.
*[_type == "helpArticle" && defined(publishedAt)]
| score(
text::semanticSimilarity(body, $query) ^ 2,
match(title, $query) ^ 3,
match(tags[], $query)
)
| order(_score desc)[0...10]
{
_id, title, excerpt, tags,
"category": category->title,
_score
}