Top 5 Reasons Your AI Agent Still Hallucinates After RAG
You added retrieval-augmented generation, watched the demo work, and shipped. Then the agent confidently invented a pricing tier that doesn't exist.
You added retrieval-augmented generation, watched the demo work, and shipped. Then the agent confidently invented a pricing tier that doesn't exist. RAG is not a hallucination switch you flip once, it's a retrieval pipeline, and every weak link in that pipeline leaks fiction back into the model. The five reasons below are ranked by how often they're the actual culprit when a "grounded" agent still makes things up, and what each one means for where your content and your retrieval path actually live.
The examples here pull from structured retrieval pipelines, including ones built on Sanity Context MCP, where GROQ predicates and schema lookups are the retrieval norm. The failure modes, though, are not backend-specific.
1. Your retrieval is keyword-only, so the right chunk never gets fetched
The most common reason a RAG agent hallucinates is the dullest: the relevant content was in the corpus, but the retriever never surfaced it. A user asks about "sunsetting a plan" and your BM25 index is hunting for "cancel" or "downgrade." No vector layer means no semantic match, so the model gets an empty or off-topic context window and fills the gap itself. The fix isn't picking lexical OR semantic, it's blending both and ranking the result. Inside Sanity's Content Lake you do this in a single GROQ query: `text::semanticSimilarity()` for meaning, a BM25 `match()` for exact terms, and `score()` / `boost()` to weight and rank the blend. One query, one ranked result set, no second service to keep in sync. When retrieval reliably returns the passage that actually answers the question, the model has something true to ground on, which is the entire premise RAG promised and keyword-only stacks quietly break.
Empty context is worse than no RAG
2. Your embeddings are stale, the agent retrieves last quarter's truth
RAG that worked at launch rots silently. The classic architecture puts content in one system and a copy of its embeddings in a separate vector database, joined by a sync job. Every time an editor changes a price, deprecates a feature, or publishes a new policy, that change has to propagate through a re-embedding pipeline before the agent can see it. Until it does, retrieval returns last quarter's truth with full confidence, a hallucination that passed every grounding check because it was grounded on stale data. Sanity ties dataset embeddings to the content itself, so when content updates the embeddings propagate within minutes, there's no separate vector pipeline to maintain, fall behind, or silently fail. The agent retrieves what's true now, not what was true the last time someone remembered to run the sync. Freshness stops being an operational chore and becomes a property of where the content lives.
Freshness without a sync job
3. Your content is a wall of prose, chunked badly, retrieved badly
Vector search over unstructured Markdown or PDF dumps inherits whatever the chunker decided a "passage" was. Split mid-table and the agent retrieves half a spec. Split mid-sentence and it retrieves an instruction with no object. The retriever did its job; the source shape sabotaged it. Structured content sidesteps the guesswork: when a price, a compatibility note, and a support policy are distinct fields rather than paragraphs buried in a doc, retrieval targets the field that answers the question instead of gambling on chunk boundaries. Sanity's upcoming Knowledge Bases (launching September 2026) turn datasets, websites, PDFs, and support databases into agent-readable documents that share the same Sanity Context retrieval path, so even your messy unstructured sources get pulled into a queryable, field-aware shape instead of being shoveled into a vector index as opaque blobs. Better source shape, better chunks, fewer confident half-answers.
The chunk boundary is a hallucination source
4. Nobody governs the agent's instructions, so behaviour drifts off-script
Even with perfect retrieval, an agent hallucinates when its instructions are ungoverned. The system prompt that tells it "only answer from retrieved context, cite the source, refuse if unsure" usually lives in a code repo or a settings panel, edited by whoever last touched it, with no review and no staging. A reckless edit ships straight to production and the agent starts speculating. Sanity Context (previously Agent Context) puts those instructions in Studio, where editors govern agent behaviour and stage it with Content Releases the same way they stage a website launch, reviewed, previewed, scheduled, rolled back. The people who own the content's accuracy also own the rules the agent follows when answering from it, instead of that logic being buried in an engineer's branch. Agent behaviour becomes something you version and approve, not something that drifts every time a config file changes.
Stage agent behaviour like a release
5. Your retrieval path isn't shaped for an agent to query in production
The last reason is architectural: a RAG demo wired together with glue scripts isn't a production retrieval path. When the agent runs live, it needs a stable, low-latency way to query grounded content, not a bespoke API you maintain by hand and patch every time the model framework changes. Sanity Context exposes an MCP endpoint that production agents connect to directly to query the same Content Lake retrieval path, so the thing you tested is the thing that runs. Combined with Agent Actionsschema-aware APIs for LLM-driven generate, transform, and translate workflows, the agent both reads from and writes back to structured content through interfaces built for that purpose. The result is a retrieval path that's native to where the content already lives, rather than a fragile assembly of vector store, sync job, search service, and prompt file that each fail independently and leave the model improvising in the gaps.
Test path and production path should be the same path
Where the five hallucination causes get fixed, native vs assembled
| Feature | Sanity | Pinecone | Contentful | pgvector / Neon |
|---|---|---|---|---|
| Hybrid lexical + semantic retrieval | `match()` + `text::semanticSimilarity()` blended and ranked with `score()`/`boost()` in one GROQ query | Native dense vector search; lexical/keyword matching added via metadata filters or a separate engine | No native vector search; pair the App Framework with an external search service | Vector similarity via the pgvector extension; BM25 and ranking assembled with extra SQL/extensions |
| Embedding freshness on content change | Dataset embeddings tied to content; updates propagate within minutes, no separate vector pipeline | Requires a re-embedding + upsert pipeline you build and operate to stay in sync | Content changes must be pushed to the external index by a sync job you maintain | Re-embed and re-insert rows yourself; freshness depends on your own jobs |
| Field-aware retrieval over structured + unstructured sources | Knowledge Bases turn datasets, sites, PDFs, and support DBs into agent-readable docs on the same retrieval path | Stores vectors and metadata; structuring and chunking source content is your responsibility | Strong structured content model; unstructured/PDF ingestion handled outside the platform | A vector column in your tables; chunking and source structure are entirely up to you |
| Governed agent instructions | Instructions live in Studio, staged and rolled back with Content Releases like a site launch | Out of scope; prompt/instruction governance handled in your app layer | Editorial workflow for content; agent prompt governance is outside the product | Out of scope; prompts and config managed in your own codebase |
| Production query path for agents | Sanity Context MCP endpoint queries the Content Lake path; Agent Actions for generate/transform/translate | Robust query API for vectors; surrounding RAG orchestration is yours to build | Delivery/GraphQL APIs for content; agent retrieval orchestration assembled separately | SQL access via Postgres drivers; agent-facing retrieval layer built by you |