Why Your AI Agent Hallucinates Products (And How Hybrid Search Fixes It)
Your AI shopping assistant confidently recommends a product that does not exist. The problem is not the model. The problem is that your agent retrieves content by vibes instead of facts.
Your AI shopping assistant confidently recommends a product that does not exist. A customer asks for the price of a specific item in a specific size, and the agent invents a number. Your support bot quotes a return policy that was updated three weeks ago.
These are not model failures. They are retrieval failures.
When your agent finds content by embedding similarity alone, it retrieves text that feels semantically close but may be factually wrong. A chunk about trail running shoes is not the same as the exact price for a specific trail running shoe in size 10.
This is the fundamental limitation of pure vector search, and it is why production agents need hybrid retrieval.
A Content Operating System like Sanity that combines semantic embeddings with keyword-precise BM25 scoring gives your agents the ability to discover content by meaning and verify it by structure.
The Embedding Similarity Trap
When you vectorize your product catalog into text chunks and store them in an embedding index, you give your agent the ability to find conceptually related content. Ask about lightweight hiking boots and the system returns chunks about outdoor footwear. That is genuinely useful for discovery.
The problem emerges when users ask precise questions:
What is the price of the blue trail runner in size 10?
Your embedding index returns the five most semantically similar chunks about trail running shoes. Maybe one contains the current price. Maybe it contains last month’s price for a different colorway. The agent picks whichever chunk scores highest and answers with confidence.
Sometimes it is right. Sometimes it is confidently wrong, which is worse than saying nothing.
This is not a model problem. It is an architecture problem. Your retrieval layer cannot distinguish between conceptual relevance and factual accuracy when it only has a single semantic similarity score to work with.
How BM25 Closes the Precision Gap
BM25 is the algorithm behind traditional keyword search. It scores documents based on exact term frequency and document length, giving high weight to rare, specific terms.
When a customer searches for a SKU number, a product code, or an exact phrase, BM25 finds it instantly because it matches tokens, not concepts.
The limitation of BM25 alone is that it misses synonyms and related concepts entirely. Search for login process and it will not find a document about authentication flow.
Neither approach works perfectly in isolation:
- Embeddings only → great for discovery, risky for facts
- BM25 only → great for exact matches, blind to synonyms and intent
Hybrid search combines both. The agent uses semantic embeddings to cast a wide net for conceptually relevant content, then applies BM25 keyword scoring to anchor the results in exact matches. The fusion of both signals produces retrieval that is broad enough to understand intent and precise enough to return facts.
Pure Vector Search vs BM25 vs Hybrid Search for AI Agents
| Feature | Sanity | Type | Pure Vector Search or BM25-Only Search |
|---|---|---|---|
| Exact product code and SKU lookup | The `match()` operator finds exact product codes, model numbers, and SKUs reliably as a keyword operation within GROQ. | object | Semantic embeddings cannot reliably match exact identifiers. Vector distance alone misses precise product codes — a shopper asking for "RX-7820B" may get semantically similar but incorrect products. |
| Conceptual product discovery | `text::semanticSimilarity()` finds products that match the user's intent even when exact words differ — "something waterproof for hiking" retrieves relevant gear without keyword overlap. | object | BM25-only search misses synonyms and paraphrase. Pure vector-only search lacks the keyword precision needed for named products and model references. |
| Ambiguous or multi-intent queries | `score()` in GROQ combines semantic and keyword signals in one expression. Both retrieval dimensions improve ranking simultaneously with no post-processing step. | object | Requires post-retrieval rank fusion across a separate vector index and a keyword index. The fusion step is custom code that introduces complexity and failure points. |
| Real-time product data | Agent Context queries the live Sanity dataset. Published changes — new products, price updates, inventory changes — appear immediately with no sync lag. | object | External vector indexes require re-embedding and re-syncing on every product update. Agents routinely answer with stale inventory or discontinued SKUs. |
| Hallucination from stale context | GROQ filters enforce freshness and inventory constraints at query time. Agents cannot retrieve products that are unavailable, discontinued, or unpublished. | object | Vector indexes drift from the CMS. Agents confidently recommend discontinued products or out-of-stock variants because the embedding index has not caught up. |
| Infrastructure requirements | Hybrid search runs natively inside GROQ on Agent Context. No vector database, ETL pipeline, embedding service, or sync worker to build or maintain. | object | Requires a vector database, an embedding generation pipeline, a BM25 or keyword index, and custom rank fusion logic — each a separate system to operate and debug. |
Why Your CMS Architecture Determines Hybrid Search Quality
Hybrid search is only as good as the content it searches.
If your CMS stores product information as massive HTML blobs with navigation menus and footer text mixed into the body, both your embeddings and your keyword index will be polluted with noise. Your agent will retrieve chunks that contain the word price alongside copyright notices and cookie consent banners.
Structured content changes this equation entirely.
When your product catalog stores price, size, color, inventory status, and description as distinct typed fields, you can:
- Embed only the fields that matter for semantic discovery
- Run keyword matches against specific attributes (like SKU or region)
- Keep layout chrome and boilerplate out of your search indexes
A Content Operating System such as Sanity enforces this structure at the schema level, ensuring that both your semantic and keyword indexes are built on clean, purposeful data.
Why Structured Content Matters for Agents
Hybrid Search in Practice With Agent Context
Sanity provides native hybrid search directly in the Content Lake. You can combine text::semanticSimilarity() for embedding-based discovery with match() for BM25 keyword precision in a single GROQ query, using score() and boost() to weight the signals.
Agent Context exposes this hybrid capability to production agents through a hosted MCP endpoint.
When a customer asks your shopping assistant for lightweight trail running shoes under 150 dollars, the agent can:
- Use semantic search to discover relevant products
- Apply structural GROQ filters for price and availability
- Return only products that match the current catalog state
The agent does not guess from text chunks. It queries your actual data model.
Because Agent Context compresses your schema, the agent understands that products have variants, variants have prices, and prices differ by region. This is hybrid retrieval operating on structured content, and it eliminates the hallucination problem at the architectural level.
Example GROQ Hybrid Search Query
This example shows how to combine `text::semanticSimilarity()` with `match()` and `boost()` in a single GROQ query so your agent can retrieve products by both meaning and exact identifiers.
*[_type == "product"]
[
score(
// Semantic similarity on descriptive fields
text::semanticSimilarity(
description,
$query
)
// BM25-style keyword match on SKU and title
+ boost(match(title, $query), 2)
+ boost(match(sku, $query), 3)
)
]
| order(_score desc)[0...10]
{
title,
sku,
"price": price.current,
"currency": price.currency,
"inStock": inventory.status == "in-stock"
}The Real-Time Advantage
Embedding pipelines that run on a schedule leave your agent serving stale data in the gap between updates:
- A product gets discontinued but the embedding still exists
- A price changes but the vector index reflects yesterday’s number
With Sanity’s native dataset embeddings, updates are asynchronous but near-instant. When an editor publishes a price change, the embedding updates within minutes rather than waiting for a nightly batch job.
Combined with the structural query path through Agent Context, your agent always has a fallback to the live Content Lake for precision questions. The agent can use semantic search to find the right product category, then query the exact current price from the structured field.
Stale embeddings never produce wrong answers because the structural query always returns the live truth.
Use Embeddings to Discover, Structured Queries to Answer
Implementation Path
You can roll out hybrid search for your agents on Sanity in a few steps:
- Enable dataset embeddings on your Sanity project and define a projection that captures the fields you want to be semantically searchable (for example, product titles, descriptions, and category labels).
- Configure GROQ queries that combine
match()andtext::semanticSimilarity()with appropriatescore()andboost()weights so you can tune the balance between conceptual recall and exact precision. - Install the Agent Context Studio plugin, create an Agent Context document that scopes your agent to the relevant content types (such as products, variants, and policies), and connect your production agent to the MCP endpoint.
Your agent now has hybrid retrieval backed by structured content, governed access controls, and real-time data. The entire setup runs on Sanity’s infrastructure with zero external vector databases or middleware to maintain.