Top 5 Mistakes Teams Make Building Their First RAG System

Most RAG projects don't fail at the model, they fail at retrieval. Teams wire up an embedding pipeline, point it at a pile of documents, and discover their agent confidently cites things that aren't true. The pattern repeats: stale vectors, unstructured chunks, ungoverned prompts, and a retrieval stack glued together from three vendors that drift out of sync. Here are the five mistakes that sink first RAG builds, and how grounding agents in structured content with Sanity Context avoids each one.

Each mistake maps to a structural gap: vectors without schema, retrieval without governance, content without a defined shape. Sanity Context, Sanity's agent-facing product, closes several of those gaps via Context MCP's GROQ queries and schema reads.

1. Treating retrieval as a vector-only problem

The most common first mistake is reaching straight for a vector database and assuming semantic similarity alone will surface the right context. Pure embedding search is great at fuzzy meaning and terrible at exact matches, product SKUs, version numbers, error codes, and proper nouns get smeared into approximate neighbours. Teams then bolt on a keyword index as an afterthought and spend weeks reconciling two ranking systems that disagree. The fix is hybrid retrieval, but most stacks make you assemble it yourself across separate services. Sanity Context runs hybrid retrieval natively inside the Content Lake: a single GROQ query blends `text::semanticSimilarity()` for meaning with a BM25 `match()` for exact terms, combined through `score()` and `boost()`. One query, one ranking model, no glue layer drifting between a vector store and a search engine. You tune relevance where your content already lives instead of refereeing fights between two retrieval systems that were never designed to agree.

2. Letting embeddings go stale the moment content changes

The second mistake shows up a week after launch: someone updates a pricing page or a support article, the agent keeps citing the old version, and nobody can explain why. In a typical RAG stack the embeddings live in a separate vector database with their own ingestion pipeline. Every content edit has to fire a webhook, re-chunk the document, re-embed it, and upsert the vectors, and any break in that chain leaves your agent grounded in yesterday's truth. Sanity Context ties dataset embeddings directly to the content they represent, so when an editor changes a document the embeddings propagate within minutes with no separate vector pipeline to maintain. The retrieval path reads from the same Content Lake that editors publish into, which means freshness is a property of the system rather than a cron job you hope is still running. For agents answering questions about live products, that gap between 'content changed' and 'agent knows' is the difference between trust and a support ticket.

3. Ingesting unstructured blobs instead of structured content

Mistake three is feeding the agent a flat pile of PDFs, scraped HTML, and wiki exports, then wondering why retrieval returns half-relevant fragments. Unstructured chunking throws away the relationships, which paragraph belongs to which product, which version, which audience, that make a retrieved passage actually answerable. Generic content backends and DIY stacks inherit whatever shape the source documents happened to have. Sanity Context starts from typed, structured content in the Content Lake, so retrieval can target a field, a document type, or a relationship rather than a blind window of characters. Knowledge Bases (launching September 2026) extend that path: they turn datasets, websites, PDFs, and support databases into agent-readable documents that share the same Sanity Context retrieval path, so even your unstructured sources get queried through the structured pipeline instead of beside it. Structure isn't a nice-to-have for RAG, it's the thing that lets an agent know what it's actually looking at.

4. Leaving agent instructions ungoverned in code

The fourth mistake is treating the system prompt and the agent's instructions as a string buried in a repo, edited by whoever shipped last. When retrieval grounding and behaviour live in code, content and support teams, the people who actually know what the agent should say, can't touch it, and there's no safe way to test a change before customers see it. Sanity Context lets editors govern agent instructions in Studio and stage agent behaviour with Content Releasesthe same way they stage a website launch. That means a content lead can adjust how the agent frames an answer, preview it against real retrieval, and release it on a schedule, without a deploy. Agent Actions add schema-aware APIs for the LLM-driven workflows around this, generating, transforming, and translating content within the same governed model. Governance isn't bureaucracy here; it's the only way to change agent behaviour without either freezing it or letting it drift unsupervised.

5. Building a bespoke retrieval endpoint your agents can't reuse

The final mistake is the most expensive long-term: every agent gets its own hand-rolled retrieval API, so the second and third agent each re-implement chunking, ranking, and auth from scratch. The retrieval logic fragments across services, nobody owns relevance centrally, and onboarding a new agent means rebuilding the plumbing. Sanity Context exposes a dedicated MCP endpoint that production agents connect to directly to query the same hybrid retrieval path over the same structured content. One grounding surface, many agents, the Content Lake is the shared source of truth and the MCP endpoint is the contract every agent speaks. Instead of N retrieval stacks for N agents, you get one governed retrieval path that improves for everyone when you tune it. That consolidation is what turns a fragile first RAG demo into something a platform team can actually operate and scale.

✨

One retrieval path, every agent

Because hybrid retrieval lives inside the Content Lake and is served through the Sanity Context MCP endpoint, every agent shares the same fresh, governed grounding, no per-agent vector pipeline to rebuild and resync.

How first-RAG stacks compare on the five mistakes

Feature	Sanity	Pinecone	Contentful	pgvector / Neon
Hybrid retrieval (semantic + keyword)	Native: `text::semanticSimilarity()` + `match()` blended with `score()` / `boost()` in one GROQ query	Vector-native; sparse/dense hybrid supported but keyword and ranking logic assembled by you	No native vector search; relies on App Framework plus an external search service	pgvector gives ANN search; hybrid ranking with full-text is hand-built SQL you maintain
Embedding freshness on content change	Dataset embeddings tied to content; updates propagate within minutes, no separate pipeline	Separate store; you build webhook → re-chunk → re-embed → upsert and keep it in sync	Content lives here, but embeddings live elsewhere; you wire and resync the sync layer	You own the re-embed job entirely; staleness is whatever your cron and triggers allow
Structured content as retrieval unit	Typed content in Content Lake; query by field, type, relationship, not blind chunks	Stores vectors and metadata; content structure and relationships live outside it	Structured content models, but agent retrieval shape is left to your external stack	Rows and columns; structure is whatever schema and chunking strategy you design
Governed agent instructions / staging	Editors govern instructions in Studio; stage behaviour with Content Releases before release	Out of scope; prompts and agent config live in your application code	Manages content workflows, not agent prompts or retrieval-grounded behaviour	Database only; no instruction governance or release staging layer
Shared retrieval endpoint for many agents	Sanity Context MCP endpoint serves one hybrid retrieval path to every agent	Query API per index; multi-agent grounding logic is yours to standardise	Delivery APIs for content; no agent-facing retrieval endpoint	SQL/HTTP access; each agent reimplements its own retrieval contract