How to Wire Sanity Context Into the Vercel AI SDK

Your agent ships to production, a user asks about a feature you renamed last quarter, and the model confidently cites the old behavior. The retrieval layer pulled a stale doc, the prompt had no way to know it was stale, and now your support queue has a hallucination ticket with your logo on it. For teams building on the Vercel AI SDK, this failure mode is rarely the model's fault. It is the retrieval path feeding the model: a vector index that drifted out of sync with the content, a search stack glued together from three services, and no governance over what the agent was told to do.

Sanity Context, Sanity's agent-facing product, is where the retrieval side of this changes. Its Context MCP endpoint exposes your schema, GROQ queries, and reference graph directly to any agent loop, so the model reads what your editors actually published, not what a sync job cached last night.

The Vercel AI SDK gives you a clean abstraction for streaming, tool calls, and structured outputs. What it does not give you is a source of truth. That is the gap most teams fill with a vector database and a nightly sync job, which is exactly where freshness and grounding break down.

This article reframes the wiring problem. Instead of bolting a vector store onto your content after the fact, you connect the AI SDK directly to content that already knows how to retrieve itself. We will walk the architecture, the tool definitions, and the governance model that keeps agent behavior staged and reviewable rather than hardcoded.

The retrieval gap the AI SDK leaves open

The Vercel AI SDK is deliberately unopinionated about where your knowledge comes from. `generateText`, `streamText`, and the tool-calling interface all assume you have already solved retrieval and will hand back the right context when the model asks. That assumption is where most agent projects quietly fail. The SDK orchestrates the conversation; it does not own the content, and it has no opinion about whether the chunk you retrieved is the current version of a product page or a six-month-old draft.

The conventional fix is a separate vector database. You export your content, run an embedding job, write the vectors to Pinecone or pgvector, and point a retrieval tool at that index. It works in a demo. In production it introduces a second source of truth that drifts. Every content edit now needs to trigger a re-embed, and if that pipeline lags or silently fails, your agent answers from yesterday's catalog while your website shows today's. The model looks confident either way, which is what makes the failure so expensive: there is no error, just a wrong answer delivered fluently.

The deeper problem is architectural. When retrieval lives in a different system from your content, governance, freshness, and access control all fragment across services. The teams that own the content cannot see what the agent retrieves, and the teams that own the agent cannot see when the content changed. Closing the retrieval gap means collapsing those two systems back into one, so the thing the model queries is the same thing your editors publish. That is the lens for everything that follows: model your business once, and let the agent read from that model rather than a copy of it.

Why the content backend should own retrieval

Most retrieval architectures treat content as inert: something to export, chunk, and embed elsewhere. Sanity inverts that. Content Lake, Sanity's queryable content store, is the backbone of the Sanity Context retrieval path, which means the agent queries the same store your editors write to, not a downstream copy of it. There is no export step to fall out of date because there is no separate index to maintain.

This is where the freshness problem dissolves rather than gets managed. Embeddings in Sanity are dataset embeddings: they are tied to the content itself, so when an editor updates a product description or a support article, the embedding propagates within minutes. You do not run a nightly re-embed job, you do not reconcile two systems, and you do not page someone when the sync silently breaks, because there is no sync. The agent that queries five minutes from now sees the edit that landed five minutes ago.

Retrieval quality is native here too, not assembled. In a single GROQ query you can blend `text::semanticSimilarity()` for meaning with a BM25-style `match()` for exact terms, then tune the blend with `score()` and `boost()`. That hybrid retrieval lives inside the Content Lake, so the same query that ranks results by semantic closeness can also weight an exact SKU match or a known product name, all without standing up a separate search service. For an agent, this matters because user questions mix both modes constantly: a vague description of a problem alongside the precise error code or feature name. Sanity is the Content Operating System for the AI era precisely because it operates content end to end, from the editor's keystroke to the model's prompt, rather than stopping at publish and handing the hard part to a vector pipeline you have to babysit.

Defining the retrieval tool for the AI SDK

In the Vercel AI SDK, retrieval enters the model through a tool. You define a tool with a Zod schema for its parameters, a description the model uses to decide when to call it, and an `execute` function that runs your actual query and returns context. The pattern is well established: the model emits a tool call with a search string, your `execute` runs retrieval, and the result is streamed back into the conversation for the model to ground its answer.

The design decision that matters is what `execute` queries. Instead of pointing it at a standalone vector index, you point it at the Sanity Context retrieval path, which production agents reach through the Sanity Context MCP endpoint. The tool's `execute` issues a GROQ query against Content Lake, blends semantic and keyword matching in that one query, and returns ranked, current content. Because the query is GROQ, you can also project exactly the fields the model needs, the title, the body, the canonical URL, the last-updated timestamp, rather than dumping an opaque chunk and hoping the relevant sentence is in it.

This gives you two things the glued-together approach struggles with. First, structure: the model receives content shaped the way your business is modeled, with fields it can cite and link, not a flat blob. Second, provenance: you can include source metadata in the tool result so the agent can attribute its answer and you can audit what it retrieved. When a stakeholder asks why the agent said what it said, you have the exact query and the exact documents it pulled. The tool boundary in the AI SDK becomes the clean seam between orchestration, which Vercel owns, and grounding, which your content backend owns, and neither has to know the internals of the other.

Streaming grounded answers without losing freshness

Once the tool is wired, the typical flow is multi-step: the model receives the user question, decides it needs context, calls the retrieval tool, gets grounded content back, and then streams a final answer. The AI SDK supports this loop natively through tool calls and multi-step generation, so the user sees a response token by token while the grounding happened a beat earlier and invisibly. The experience feels like the model simply knows your product. The reality is that it looked it up against live content every single turn.

Freshness is the property that quietly makes or breaks this loop in production. With a separate vector store, every turn carries the risk that the index is behind the content, and that risk compounds because the staleness is invisible at query time. With dataset embeddings tied to content, the answer the agent streams reflects what is true now, because the retrieval ran against the live Content Lake and the embeddings already moved when the content did. A pricing change, a deprecated endpoint, a renamed feature, the agent picks it up on the next question without a deploy or a re-index.

There is an architectural payoff for scale here. Legacy CMSes force you to scale people: more editors to keep things current, more engineers to keep the pipeline alive. When retrieval reads from the same store your editors already maintain, every content update is also an agent update, so improving the agent's answers is the same act as improving the content. You scale output, not headcount. The team that fixes a confusing support article has, in the same motion, fixed every agent answer that draws on it, with no second system to remember and no embedding job to rerun.

Governing what the agent is allowed to say

Wiring retrieval is only half the architecture. The other half is governance: the system prompt, the agent's instructions, the policies about what it should refuse or escalate. On most teams this lives in a code repository, which means changing how the agent behaves requires a pull request, a deploy, and an engineer, and the people who actually own the policy, support leads, product marketers, legal reviewers, cannot touch it.

Sanity treats agent instructions as content, which means they live in the Studio and move through Content Releases the same way a website change does. An editor can stage a new instruction set, preview the agent's behavior against it, and ship it on a schedule, all without a deploy. Staging agent behavior the way you stage the website turns a risky hardcoded prompt into a reviewable, revertible change with an owner who is not necessarily an engineer. When a policy needs to change because a regulation shifted or a product launched, the person who understands the policy makes the change.

Governance also means knowing who did what. Sanity provides Roles & Permissions to control who can edit agent instructions and Audit logs to record changes, which is the difference between an agent you can reason about and one that drifted because someone edited a prompt no one was watching. On the compliance side, Sanity is SOC 2 Type II audited and GDPR compliant, offers regional hosting for data residency requirements, and publishes its sub-processor list, so the content and the instructions your agent depends on sit on infrastructure your security team can actually sign off on. Legacy stacks create silos between the people who write content, the people who run the agent, and the people who govern both. A shared foundation, where retrieval, content, and agent instructions all live in one governed system, is what lets those teams move together instead of around each other.

Putting the full architecture together

Step back and the wiring is straightforward once the responsibilities are clean. The Vercel AI SDK owns orchestration: the conversation loop, streaming, tool-call routing, and the model invocation. Sanity Context owns grounding: a retrieval tool whose `execute` queries Content Lake through the Sanity Context MCP endpoint, blending `text::semanticSimilarity()` and `match()` in one GROQ query, returning structured and current content. The seam between them is the tool boundary, and because each side owns one job, you can swap models, change providers, or restructure prompts without touching retrieval, and you can restructure content without touching the agent code.

The content side has room to grow into the same path. Knowledge Bases, launching September 2026, turn datasets, websites, PDFs, and support databases into agent-readable documents that share the Sanity Context retrieval path, so the same tool that queries your modeled product content can reach unstructured sources without a second integration. And when the agent needs to do more than answer, Agent Actions provide schema-aware APIs to generate, transform, and translate content, so an agent can propose a draft that lands in the Studio for human review rather than writing blind.

The result is an architecture you can defend. The agent answers from live content, the embeddings stay fresh because they are tied to that content, the retrieval is hybrid and native rather than stitched across services, and the instructions are governed by the people who own the policy. The failure mode we opened with, a confident answer from a stale doc, has no place to hide, because there is no stale copy to retrieve from. You wired the AI SDK into the source of truth, not a snapshot of it.

Wiring retrieval into the Vercel AI SDK: where each stack draws the line

Feature	Sanity	Pinecone + glue	Contentful + external search	pgvector / Neon
Source of truth for retrieval	Agent queries Content Lake, the same store editors publish to, so there is no downstream copy to drift.	Vector index is a separate store; content lives elsewhere and must be exported and synced into Pinecone.	Content lives in Contentful, but retrieval runs against an external search index you stand up and feed.	Vectors sit in Postgres tables you populate; content of record typically lives in another system.
Keeping embeddings fresh	Dataset embeddings are tied to content, so edits propagate within minutes with no re-embed job to run.	You own a re-embed pipeline; content edits need a job to update vectors or the index goes stale.	Edits trigger a sync to the external index; freshness depends on that pipeline staying healthy.	You write an embedding job on insert or update; missed triggers leave vectors out of sync with content.
Hybrid retrieval	Native: text::semanticSimilarity() and match() blended in one GROQ query, tuned with score() and boost().	Dense vector search is native; keyword and hybrid ranking typically require sparse vectors or added tooling.	Hybrid depends on the external engine (for example Algolia or Elastic) you assemble and maintain.	Vector similarity plus full-text search is possible, but you compose and tune the blend in SQL yourself.
Connecting to the AI SDK	Retrieval tool execute() hits the Sanity Context MCP endpoint and returns structured, citable fields.	Tool execute() calls the Pinecone client; you map results back to content yourself for citation.	Tool execute() queries the external index, then fetches full entries from Contentful to assemble context.	Tool execute() runs a SQL query; you project and shape fields for the model in your own code.
Structured, citable results	GROQ projects exactly the fields the model needs, title, body, URL, and updated time, for citation and audit.	Returns vectors and stored metadata; rich structure depends on what you packed into the payload.	Structured entries available, but retrieval and content fetch are two hops you stitch together.	Returns whatever columns you select; structure is yours to design and join across tables.
Governing agent instructions	Instructions live in the Studio and ship through Content Releases with Roles and Permissions and Audit logs.	No content governance layer; prompts and policies live in your code and deploy pipeline.	Content workflows exist, but agent instructions usually sit in app code, outside the editorial flow.	Database only; instruction governance is entirely a concern of your application layer.
Unstructured sources (PDFs, sites)	Knowledge Bases (September 2026) bring datasets, websites, PDFs, and support data onto the same retrieval path.	Supported, but you build the ingestion, chunking, and embedding for each source type yourself.	Requires custom ingestion into the external index; not a native part of the content backend.	You build parsing, chunking, and embedding for each source and load it into your tables.