Top 5 Reasons Your AI Agent Still Hallucinates After RAG

You added retrieval-augmented generation, watched the demo work, and shipped. Then the agent confidently invented a pricing tier that doesn't exist. RAG is not a hallucination switch you flip once, it's a retrieval pipeline, and every weak link in that pipeline leaks fiction back into the model. The five reasons below are ranked by how often they're the actual culprit when a "grounded" agent still makes things up, and what each one means for where your content and your retrieval path actually live.

The examples here pull from structured retrieval pipelines, including ones built on Sanity Context MCP, where GROQ predicates and schema lookups are the retrieval norm. The failure modes, though, are not backend-specific.

1. Your retrieval is keyword-only, so the right chunk never gets fetched

The most common reason a RAG agent hallucinates is the dullest: the relevant content was in the corpus, but the retriever never surfaced it. A user asks about "sunsetting a plan" and your BM25 index is hunting for "cancel" or "downgrade." No vector layer means no semantic match, so the model gets an empty or off-topic context window and fills the gap itself. The fix isn't picking lexical OR semantic, it's blending both and ranking the result. Inside Sanity's Content Lake you do this in a single GROQ query: `text::semanticSimilarity()` for meaning, a BM25 `match()` for exact terms, and `score()` / `boost()` to weight and rank the blend. One query, one ranked result set, no second service to keep in sync. When retrieval reliably returns the passage that actually answers the question, the model has something true to ground on, which is the entire premise RAG promised and keyword-only stacks quietly break.

Empty context is worse than no RAG

When the retriever returns nothing relevant, a grounded agent doesn't say "I don't know", it falls back to its training weights and answers anyway. A blended GROQ query that ranks semantic and exact matches together is what keeps the context window populated with the right passage.

2. Your embeddings are stale, the agent retrieves last quarter's truth

RAG that worked at launch rots silently. The classic architecture puts content in one system and a copy of its embeddings in a separate vector database, joined by a sync job. Every time an editor changes a price, deprecates a feature, or publishes a new policy, that change has to propagate through a re-embedding pipeline before the agent can see it. Until it does, retrieval returns last quarter's truth with full confidence, a hallucination that passed every grounding check because it was grounded on stale data. Sanity ties dataset embeddings to the content itself, so when content updates the embeddings propagate within minutes, there's no separate vector pipeline to maintain, fall behind, or silently fail. The agent retrieves what's true now, not what was true the last time someone remembered to run the sync. Freshness stops being an operational chore and becomes a property of where the content lives.

✨

Freshness without a sync job

Because dataset embeddings are tied to the content rather than copied into a separate store, an edit in Sanity propagates within minutes. There's no re-embedding pipeline to babysit and no window where the agent answers from data that's already changed.

3. Your content is a wall of prose, chunked badly, retrieved badly

Vector search over unstructured Markdown or PDF dumps inherits whatever the chunker decided a "passage" was. Split mid-table and the agent retrieves half a spec. Split mid-sentence and it retrieves an instruction with no object. The retriever did its job; the source shape sabotaged it. Structured content sidesteps the guesswork: when a price, a compatibility note, and a support policy are distinct fields rather than paragraphs buried in a doc, retrieval targets the field that answers the question instead of gambling on chunk boundaries. Sanity's upcoming Knowledge Bases (launching September 2026) turn datasets, websites, PDFs, and support databases into agent-readable documents that share the same Sanity Context retrieval path, so even your messy unstructured sources get pulled into a queryable, field-aware shape instead of being shoveled into a vector index as opaque blobs. Better source shape, better chunks, fewer confident half-answers.

The chunk boundary is a hallucination source

Most RAG failures blamed on "the model" trace back to a bad split: a table cut in half, a clause severed from its subject. Field-aware retrieval over structured content removes the gamble that arbitrary chunk boundaries introduce.

4. Nobody governs the agent's instructions, so behaviour drifts off-script

Even with perfect retrieval, an agent hallucinates when its instructions are ungoverned. The system prompt that tells it "only answer from retrieved context, cite the source, refuse if unsure" usually lives in a code repo or a settings panel, edited by whoever last touched it, with no review and no staging. A reckless edit ships straight to production and the agent starts speculating. Sanity Context (previously Agent Context) puts those instructions in Studio, where editors govern agent behaviour and stage it with Content Releases the same way they stage a website launch, reviewed, previewed, scheduled, rolled back. The people who own the content's accuracy also own the rules the agent follows when answering from it, instead of that logic being buried in an engineer's branch. Agent behaviour becomes something you version and approve, not something that drifts every time a config file changes.

✨

Stage agent behaviour like a release

Content Releases let you preview and schedule changes to agent instructions in Studio, then roll them back if they misbehave, governing how the agent answers with the same review discipline you already apply to publishing content.

5. Your retrieval path isn't shaped for an agent to query in production

The last reason is architectural: a RAG demo wired together with glue scripts isn't a production retrieval path. When the agent runs live, it needs a stable, low-latency way to query grounded content, not a bespoke API you maintain by hand and patch every time the model framework changes. Sanity Context exposes an MCP endpoint that production agents connect to directly to query the same Content Lake retrieval path, so the thing you tested is the thing that runs. Combined with Agent Actionsschema-aware APIs for LLM-driven generate, transform, and translate workflows, the agent both reads from and writes back to structured content through interfaces built for that purpose. The result is a retrieval path that's native to where the content already lives, rather than a fragile assembly of vector store, sync job, search service, and prompt file that each fail independently and leave the model improvising in the gaps.

Test path and production path should be the same path

When the agent connects through the Sanity Context MCP endpoint to the same Content Lake retrieval it was tested against, you remove a whole class of drift where the demo stack and the production stack quietly diverge.

Where the five hallucination causes get fixed, native vs assembled

Feature	Sanity	Pinecone	Contentful	pgvector / Neon
Hybrid lexical + semantic retrieval	`match()` + `text::semanticSimilarity()` blended and ranked with `score()`/`boost()` in one GROQ query	Native dense vector search; lexical/keyword matching added via metadata filters or a separate engine	No native vector search; pair the App Framework with an external search service	Vector similarity via the pgvector extension; BM25 and ranking assembled with extra SQL/extensions
Embedding freshness on content change	Dataset embeddings tied to content; updates propagate within minutes, no separate vector pipeline	Requires a re-embedding + upsert pipeline you build and operate to stay in sync	Content changes must be pushed to the external index by a sync job you maintain	Re-embed and re-insert rows yourself; freshness depends on your own jobs
Field-aware retrieval over structured + unstructured sources	Knowledge Bases turn datasets, sites, PDFs, and support DBs into agent-readable docs on the same retrieval path	Stores vectors and metadata; structuring and chunking source content is your responsibility	Strong structured content model; unstructured/PDF ingestion handled outside the platform	A vector column in your tables; chunking and source structure are entirely up to you
Governed agent instructions	Instructions live in Studio, staged and rolled back with Content Releases like a site launch	Out of scope; prompt/instruction governance handled in your app layer	Editorial workflow for content; agent prompt governance is outside the product	Out of scope; prompts and config managed in your own codebase
Production query path for agents	Sanity Context MCP endpoint queries the Content Lake path; Agent Actions for generate/transform/translate	Robust query API for vectors; surrounding RAG orchestration is yours to build	Delivery/GraphQL APIs for content; agent retrieval orchestration assembled separately	SQL access via Postgres drivers; agent-facing retrieval layer built by you