RAG & Grounding7 min readยท

Structured Content as a Grounding Layer: An Architecture Primer

A support agent confidently tells a customer that a deprecated API still works, citing a version of the docs that was retired six months ago.

A support agent confidently tells a customer that a deprecated API still works, citing a version of the docs that was retired six months ago. The retrieval layer pulled a stale chunk, the model wrapped it in fluent prose, and nobody caught it until the support ticket escalated. This is the default failure mode of retrieval-augmented generation built on flat text: the index drifts away from the truth, the embeddings go stale, and the agent has no way to know it is wrong.

The problem is rarely the model. It is the grounding layer underneath it. When you shred PDFs and web pages into undifferentiated chunks, you throw away the structure that tells an agent what a fact is, when it changed, and whether it is allowed to say it. Sanity Context is the AI Content Operating System's grounding layer, an intelligent backend that keeps agent retrieval anchored to governed, structured content instead of a snapshot of text that started rotting the moment it was indexed.

This primer reframes grounding as an architecture problem, not a prompt problem. We will walk the path from raw content to a grounded answer, show why structure beats chunking at every hop, and name the mechanisms that keep retrieval fresh and reviewable.

Why flat-chunk RAG drifts away from the truth

The standard RAG pipeline looks deceptively clean: ingest documents, split them into chunks, embed the chunks, store the vectors, retrieve the top-k at query time. Each of those steps quietly discards something an agent needs. Chunking severs a fact from its surrounding context, so a sentence about a pricing tier loses the heading that told you which product it described. Embedding-only retrieval rewards semantic resemblance, which is why a question about refunds happily returns a passage about returns that uses similar words but answers a different question. And the vector index is a point-in-time copy: the moment the source content changes, the index is wrong, and it stays wrong until someone remembers to re-run the pipeline.

The enterprise consequence is not abstract. A documentation set that updates weekly produces a retrieval layer that is, on average, days out of date, and the agent has no signal that the chunk it retrieved has been superseded. Worse, there is no governance seam. The chunk does not know it was part of a draft, an archived release, or a region-restricted document, so the agent will cite it just as readily as a published, approved fact. The pipeline treats every string as equally true and equally current.

This is the gap structured content closes. When the grounding layer preserves the shape of content, what type a document is, which fields are authoritative, what state it is in, and when it last changed, retrieval can reason about more than surface similarity. Structure is the difference between an index that happens to contain the answer and a backend that knows where the answer lives.

Structure as the unit of retrieval, not the chunk

Treating a content model as the unit of retrieval changes what the agent can ask for. Instead of splitting a product page into arbitrary 500-token windows, you query against typed documents and their fields: a product has a name, a status, a set of supported regions, and a body. A support article references the product it documents and carries a published-or-draft state. Those relationships are queryable facts, not artifacts you hope survived the tokenizer.

In Sanity, the Content Lake is the queryable content store that holds this structure, and GROQ is the query language that walks it. Because the content is modeled rather than flattened, a grounding query can filter to only published documents, only the current region, only the product the question is actually about, and then rank what remains. You are no longer asking the index to guess intent from cosine distance alone; you are giving it the schema to narrow the search space before similarity ever enters the picture. This is the first of Sanity's pillars in practice: model your business, then let retrieval inherit that model.

The counter-example makes the point. A flat vector store can hold the same product page, but it cannot answer give me only the published, US-available variant without metadata you bolted on by hand and now have to keep in sync. Modeling content as structured documents means the filters that matter for grounding, freshness, jurisdiction, and approval, are intrinsic to the data rather than reconstructed at query time. The chunk was never the right unit. The document, with its fields and references intact, is.

Illustration for Structured Content as a Grounding Layer: An Architecture Primer
Illustration for Structured Content as a Grounding Layer: An Architecture Primer

Hybrid retrieval: semantic recall with lexical precision

Pure vector search has a precision problem, and pure keyword search has a recall problem. Ask about an exact error code or a SKU and embeddings will happily return things that are merely thematically close. Ask in natural language about a concept and keyword matching will miss every document that phrased it differently. Production grounding needs both, blended, with the blend tuned per query rather than chosen once for the whole system.

In Sanity Context this is native to the Content Lake, not assembled from a separate search service. A single GROQ query combines `text::semanticSimilarity()` for semantic recall with a lexical `match()` for exact-term precision, then blends the two with `score()` and `boost()` so you control how much weight each signal carries. The exact-match clause catches the error code; the semantic clause catches the paraphrase; the scoring function decides how to rank a document that satisfies one, the other, or both. All of it runs in the same query, against the same governed content, in one round trip.

The architectural payoff is that retrieval quality stops being a tuning exercise across two disconnected systems. When your vector database lives apart from your content backend, every relevance change means reconciling two stores, two update cadences, and two sources of truth about what a document currently says. Blending lexical and semantic signals inside the backend that already holds the canonical content collapses that gap. The thing you rank is the thing you publish, so there is no window where the search layer and the content layer disagree about reality.

The freshness problem: embeddings that track their source

Every RAG architecture eventually confronts the same operational tax: the embeddings have to be regenerated whenever the content changes, and the more sources you have, the more pipelines you babysit. Teams build orchestration to detect changes, re-embed the affected documents, and reconcile the vector store, and that orchestration becomes its own fragile system with its own lag, its own failures, and its own on-call rotation. In the gap between a content edit and a successful re-index, the agent is grounded in a past that no longer exists.

Sanity's answer is to tie embeddings to the content itself. Because dataset embeddings live with the documents in the Content Lake, an edit to a document propagates to its embedding within minutes, with no separate vector pipeline to maintain. There is no standalone ingestion job to monitor, no drift between two systems, because there is only one system. The same edit that publishes the corrected pricing also refreshes what the agent will retrieve about that pricing.

This maps to the automate everything pillar. The expensive part of RAG in production is rarely the first build; it is the indefinite maintenance of the synchronization machinery. Removing the second store removes the class of bugs where the content is right but the index is stale, which is precisely the failure that put a deprecated API in front of a customer in this article's opening. Freshness stops being a scheduled job and becomes a property of the backend.

Governing what an agent is allowed to ground on

Retrieval accuracy is necessary but not sufficient. The harder enterprise question is governance: which content is an agent permitted to surface, who approved it, and how do you stage a change to agent behavior before it reaches a customer. A grounding layer that cannot answer these questions will, sooner or later, let an agent cite a draft, an internal note, or a region-restricted disclosure, and in regulated contexts that is not a quality bug but a compliance incident.

Because Sanity Context grounds agents in the same Content Lake that editors work in, governance is not a separate layer bolted on after the fact. Editors review and approve agent-facing content in the Studio, and Content Releases let teams stage changes to what an agent can ground on and ship them as a coordinated set, the same way they stage a website launch. Knowledge Bases turn datasets, websites, PDFs, and support databases into agent-readable documents that share the Sanity Context retrieval path, so external sources inherit the same review and access controls rather than entering through an ungoverned side door. Agent Actions provide schema-aware APIs for the generate, transform, and translate workflows, and production agents connect through the Sanity Context MCP endpoint.

The broader point is that legacy CMSes stop at publishing, while Sanity operates content end to end, through the point where an agent reads it. Sanity is SOC 2 Type II compliant and GDPR-aligned, with regional hosting for data residency and a published sub-processor list, which matters precisely because the grounding layer is now part of your compliance surface, not a downstream toy.

An end-to-end grounding architecture, assembled

Putting the pieces together, a grounded agent on this architecture follows a single coherent path rather than a relay of disconnected systems. Content is modeled as typed documents in the Content Lake, with the fields, references, and states that encode what is authoritative and current. Editors govern that content in the Studio and stage changes through Content Releases, so what the agent can ground on is reviewable before it ships. External material flows in through Knowledge Bases, where it becomes structured, agent-readable, and subject to the same controls.

At query time, the agent connects through the Sanity Context MCP endpoint and issues a GROQ query that filters by structure first, published state, region, the relevant product, then ranks the survivors with a blended `text::semanticSimilarity()` and `match()` scored by `score()` and `boost()`. The embeddings it searches were refreshed within minutes of the last content edit because they live with the content, so the answer reflects the present, not a past index run. Agent Actions handle the write-side workflows when the agent needs to generate or transform content rather than just read it.

This is the power anything pillar realized: one governed backend serving a website, a support agent, and an internal copilot from the same structured source of truth. Where legacy CMSes create silos, Sanity provides a shared foundation, and where rigid systems force you to scale headcount to keep retrieval honest, the architecture scales output instead. The grounding layer is no longer a brittle appendage to your CMS. It is the Content Operating System doing what it was built to do.