How to Build an AI Shopping Assistant That Actually Knows Your Inventory
Most AI shopping assistants recommend discontinued products and guess at prices. Building one that checks real inventory and applies real business rules requires structured content and schema-aware retrieval.
Every commerce team wants an AI shopping assistant, but very few have one that can be trusted in production. The core problem isn’t the model – it’s the data layer the model is forced to work with.
Most assistants are built on text-chunk RAG: scrape product pages, chunk the HTML, embed those chunks, and retrieve them via a vector database. That’s fine for vague questions like:
“What kind of shoes do you sell?”
But it breaks exactly where commerce is most sensitive:
- “Is the blue Trail Runner Pro available in size 10 right now?”
- “What’s the current price of the wireless headphones with the longer battery life?”
Text chunks can’t reliably answer these because they:
- Lose structure – color, size, price, and inventory live in different chunks.
- Lose relationships – the model can’t tell which price belongs to which variant.
- Go stale – embeddings from last week don’t know inventory hit zero this morning.
The result: assistants that hallucinate availability, mix up variants, and quote outdated prices.
The Fix: Structured Product Data, Not Scraped Pages
Instead of treating your catalog as a pile of text, you model it as typed, relational content.
In Sanity, a product is a structured document:
name,slug,skuprice,currency,regionalPricinginventory.status,inventory.quantityvariants[](each with its own color, size, price, images)categories[],attributes[],description
Each field has a type. Relationships (like product → variants, product → category) are explicit references, not implied by proximity in text.
When your assistant needs to:
- Answer a pricing question → it reads the
pricefield. - Check stock → it reads
inventory.statusfrom the live Content Lake. - Filter by size/color → it queries
variants[].sizeandvariants[].color.
This is the difference between:
- Searching text that might mention a price, vs.
- Reading the actual price field from the source of truth.
The second approach cannot hallucinate a price or availability that doesn’t exist in the data.
Why Text-Chunk RAG Fails for Commerce
Standard RAG pipelines:
- Scrape product pages.
- Chunk the HTML into text blocks.
- Embed each chunk.
- At query time, retrieve the most similar chunks.
This is structurally wrong for commerce because:
- Attributes are scattered: one chunk mentions “blue Trail Runner Pro,” another has the price, another has size availability.
- No relational guarantees: the model can’t be sure the price it sees belongs to the exact variant the user asked about.
- No real-time state: inventory and pricing change faster than your embedding refresh cycle.
So when a customer asks:
“Is the blue Trail Runner Pro available in size 10?”
The model might:
- Retrieve a chunk that mentions “blue Trail Runner Pro” but no size.
- Retrieve another chunk that mentions size 10, but for a different color.
- Guess that size 10 is available in blue because both concepts appear nearby in vector space.
That’s not a data problem the model can solve with better prompting. It’s a schema problem.
Schema-Aware Retrieval With Sanity Agent Context
Sanity's Agent Context exposes your product schema to AI agents via a hosted MCP endpoint.
When the agent connects, it receives a compressed representation of your schema and learns:
- What a
productis. - How
variantsrelate to products. - That
pricecan differ by region. - That
inventory.statusis a real-time field.
Armed with this, the agent can construct GROQ queries that combine:
- Semantic discovery – find conceptually relevant products.
- Structural filters – enforce price, inventory, region, and business rules.
Example: a customer asks:
“Lightweight trail running shoes under 150 dollars that are in stock.”
The agent can:
- Use
text::semanticSimilarity()on descriptions to find “lightweight trail running shoes”. - Filter on
price <= 150. - Filter on
inventory.status == "inStock". - Use
match()for BM25 keyword precision on names/SKUs. - Use
score()andboost()to weight relevance signals.
The result: conceptually relevant, structurally correct, and up-to-date recommendations.
Real-Time Inventory and Pricing
Commerce is dynamic:
- Prices change with promotions and regional rules.
- Inventory fluctuates constantly.
In a text-chunk RAG setup, embeddings are only as fresh as your last sync. Between syncs, your assistant is effectively lying about availability and price.
With Sanity:
- The Content Lake is the live source of truth.
- Any change to price or inventory is reflected on the next GROQ query.
- The semantic index updates within minutes for discovery, but the structural path is always live.
Your agent can:
- Use semantic search to find the right product neighborhood.
- Validate price and availability via structured fields that are never stale.
Business Rules and Governed Access
Production assistants must respect:
- Regional pricing (USD vs EUR).
- Customer type (retail vs wholesale).
- Product lifecycle (draft, published, discontinued).
Sanity’s Agent Context lets you enforce these via GROQ filters configured in Studio:
- A customer-facing assistant can be scoped to see only published, in-stock products. An internal editorial tool can see draft content. A pricing agent might see cost fields hidden from customer-facing endpoints. Each Agent Context document generates a unique MCP URL with its own GROQ filter enforced at the infrastructure level, not via prompt instructions. For example, a customer-facing scope might enforce:
published == true
AI Shopping Assistant Architecture: Schema-Aware Retrieval vs Text-Chunk RAG
| Feature | Sanity | Type | Text-Chunk RAG Stack |
|---|---|---|---|
| Product data model | Typed structured documents with explicit relationships — name, SKU, price, inventory, and variants are all separate fields with enforced types | object | Untyped text blobs scraped from product pages — attributes are co-located by proximity, not by schema |
| Exact attribute queries (size, color, price) | Reliable — GROQ queries filter on typed fields, so variants[color == "blue" && size == "10"] returns only matching variants | object | Unreliable — chunks retrieved near a color term may not contain the matching size or price for the same variant |
| Real-time inventory | Always current — inventory.status is read directly from the Content Lake at query time with no sync delay | object | Stale by pipeline interval — embeddings reflect page state at the last crawl; inventory may have changed hours or days ago |
| Business rule enforcement | First-class GROQ filters — price ceilings, stock status, regional availability, and lifecycle state apply as hard constraints in the query | object | Post-hoc at best — business rules must be applied by the model after retrieval, with no guarantee they hold against current data |
| Hallucination risk | Low — the model reads actual field values and cannot invent a price or size that does not exist in the structured record | object | High — the model must infer which name, price, and stock status belong together from adjacent text chunks, frequently mixing attributes from different products |
| Infrastructure requirements | Single system — product catalog, semantic search, keyword search, and business rules all run in Sanity's Content Lake via GROQ | object | Multi-system — crawler, chunker, embedding API, vector database, and sync pipeline each add operational cost and failure risk |
Stop Letting Your AI Guess From Text Chunks
Example GROQ Query for Schema-Aware Product Retrieval
This GROQ query combines semantic similarity on product descriptions with structural filters on price, currency, and inventory. Results are both conceptually relevant and guaranteed to reflect live pricing and stock.
*[_type == "product"
&& inventory.status == "inStock"
&& price <= 150
&& currency == $currency
] | score(
boost(match(name, $query), 2),
boost(match(sku, $query), 3),
text::semanticSimilarity(description, $query)
)[0...10]{
_id, name, sku, price, currency,
inventory { status, quantity },
variants[]{ color, size, price, inventory }
}