Build Next.js agent routes that retrieve and revalidate with Sanity Context

Your Next.js agent route works on localhost. Then you ship it and the answers go stale. The agent quotes a price that changed last week, references a product variant you discontinued, or serves a cached response from a build that ran three deploys ago. The retrieval call sits inside a route handler that nobody revalidates, and your `fetch` cache silently holds the old payload until someone notices in production.

Sanity Context is Sanity's agent-facing product. Its primary surface today is Context MCP, a hosted, read-only MCP endpoint that exposes schema reads, GROQ queries, reference traversal, and optional semantic search across a Sanity dataset. Knowledge Bases is the second surface, for unstructured sources like PDFs, websites, and support data. From a Next.js route, you call it the same way you call any other content source: a typed query over HTTP, with cache and revalidation semantics you control per request.

This article is about wiring that cleanly. We will fix the App Router caching traps first, then stream a `POST` agent handler, then connect the retrieval step to Sanity Context so the content your agent reads is the content an editor approved, with `revalidateTag` and the Live Content API keeping it current without a redeploy.

The fetch cache is why your agent route serves stale answers

Next.js 15 changed the default, but plenty of projects still run on the older behavior where `fetch()` inside the App Router is cached aggressively. The failure looks like this: your route handler retrieves content, the content changes, and the route keeps returning the build-time snapshot. You redeploy, it fixes itself for a while, then drifts again. Developers chase it as a model problem when it is a caching problem.

The first thing to get right is what the default actually is in your version. In Next.js 14 and earlier, `fetch` in a server component or route handler is cached with `force-cache` unless you opt out. In Next.js 15, `fetch` is no longer cached by default, but `GET` route handlers and `generateStaticParams` paths still are. An agent endpoint is almost never something you want statically cached, because each request carries a different user question.

Make the caching explicit instead of inheriting a default you did not choose. For a route handler that runs an agent, you want `dynamic = 'force-dynamic'` or per-fetch `cache: 'no-store'` on the calls that must be fresh, and a deliberate `next: { tags: [...] }` on the calls you DO want to cache and revalidate on demand. The point is not to disable caching everywhere. The point is to decide, per retrieval call, whether the agent can tolerate a slightly stale read or needs the current value.

Make the agent route's cache behavior explicit

Decide freshness per retrieval call rather than inheriting a framework default.

// app/api/agent/route.ts
export const dynamic = 'force-dynamic'

export async function POST(req: Request) {
  const { question } = await req.json()

  // This read MUST be current: prices, stock, publish state.
  const live = await fetch('https://example.com/api/inventory', {
    cache: 'no-store',
  }).then((r) => r.json())

  // This read CAN be cached and revalidated on demand by tag.
  const docs = await fetch('https://example.com/api/articles', {
    next: { tags: ['articles'], revalidate: 3600 },
  }).then((r) => r.json())

  // ... pass live + docs into the agent loop
  return Response.json({ live, docs })
}

Stream the agent response from a route handler without blocking retrieval

Once caching is explicit, the next Next.js-native problem is shape: an agent route that does retrieval, then an LLM call, then returns a single JSON blob will feel slow because the user stares at a spinner for the whole round trip. The fix is to stream. A route handler can return a `ReadableStream`, and the Vercel AI SDK's `streamText` gives you one that pipes tokens as they arrive.

The ordering matters. Retrieval happens BEFORE the stream opens, because the model needs the context in its prompt. So the latency you can hide is the generation latency, not the retrieval latency. That makes retrieval speed the thing worth optimizing, which is exactly why the cache decisions in the previous section pay off: a `no-store` read that hits a slow upstream on every request is the part of the route the user actually waits on.

Keep the retrieval function separate from the handler so you can test it, trace it, and swap its source. A clean agent route reads as three steps: gather context, build messages, stream. Resist folding the retrieval into the prompt template inline, because when an answer is wrong in production you want to log exactly what the agent retrieved, independent of what it generated.

A streaming agent route with retrieval up front

Retrieval runs first; only generation tokens are streamed.

// app/api/agent/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
import { retrieveContext } from '@/lib/retrieve'

export const dynamic = 'force-dynamic'

export async function POST(req: Request) {
  const { question } = await req.json()

  // Retrieval resolves before the stream opens.
  const context = await retrieveContext(question)

  const result = streamText({
    model: openai('gpt-4o'),
    system: 'Answer only from the provided context.',
    messages: [
      { role: 'system', content: JSON.stringify(context) },
      { role: 'user', content: question },
    ],
  })

  return result.toTextStreamResponse()
}

Why your retrieval function is the part that actually breaks

When a streamed answer is wrong, the instinct is to blame the model or tweak the system prompt. In practice the bug is usually upstream: the retrieval function returned the wrong documents, returned a stale copy, or returned a draft that should never have shipped. The model faithfully summarized bad input.

Three failure modes show up over and over in Next.js agent routes. First, structural mismatch: the user asked for the 2024 pricing page but pure keyword or vector search returned the 2023 one, because the query had a date constraint that the retrieval ignored. Second, publish-state leakage: the retrieval read from a source that includes unpublished drafts, so the agent quotes copy an editor never approved. Third, reference blindness: the user asked about a product, the answer needed the linked warranty document, and the retrieval returned the product record without traversing the reference.

These are not model problems and no amount of prompt engineering fixes them. They are retrieval problems. The retrieval function needs to express structural predicates (date ranges, authors, product variants, publish state), follow references between documents, and only then optionally rank by semantic similarity. That ordering, structure first and semantics last, is the opposite of the vector-search-first habit most RAG tutorials teach, and it is where Sanity Context comes in.

⚠️

Most wrong answers are wrong retrieval, not wrong models

Before you A/B test prompts or swap models, log the exact context your retrieval function returned for the failing question. The majority of production agent failures trace to a retrieval step that returned the wrong document, a stale copy, or an unpublished draft, not to the model misreading correct input.

Connect the route to Sanity Context over the Context MCP endpoint

The fastest way to give a Next.js agent route a real retrieval source is the Context MCP endpoint. It is a hosted, read-only MCP server that exposes your Sanity dataset as schema-aware tools: the agent can read your content model, run GROQ queries, and traverse references without you hand-writing each query. Because it is MCP, the Vercel AI SDK attaches it as a tool source and the model picks the right tool per turn.

The read-only constraint is deliberate and worth understanding. Through Context MCP the agent can read, query, and traverse, but it cannot write. Mutations go through Agent Actions, a separate path, so an agent loop wired to MCP can never accidentally edit your content. For a public-facing Next.js route that is exactly the boundary you want.

In the route handler you attach the MCP server, pull the tools, and hand them to `streamText`. The agent now resolves the user's question against your actual content model: schema reads tell it what fields exist, GROQ queries fetch the matching documents, and reference traversal pulls the linked records the answer needs. The structural correctness from the previous section is built into the tools, because GROQ filters on publish state, dates, and references are first-class, not an afterthought bolted onto a vector index.

Attach the Context MCP endpoint as a tool source

Context MCP is read-only; the agent can query and traverse but never write.

// app/api/agent/route.ts
import { openai } from '@ai-sdk/openai'
import { streamText, experimental_createMCPClient } from 'ai'

export const dynamic = 'force-dynamic'

export async function POST(req: Request) {
  const { question } = await req.json()

  const mcp = await experimental_createMCPClient({
    transport: {
      type: 'sse',
      url: process.env.SANITY_CONTEXT_MCP_URL!,
      headers: { Authorization: `Bearer ${process.env.SANITY_TOKEN}` },
    },
  })

  const tools = await mcp.tools()

  const result = streamText({
    model: openai('gpt-4o'),
    system: 'Answer only from content returned by the tools.',
    tools,
    messages: [{ role: 'user', content: question }],
    onFinish: () => mcp.close(),
  })

  return result.toTextStreamResponse()
}

When you want full query control: a typed GROQ tool with hybrid ranking

MCP is the default path, but sometimes you want to own the exact query: a fixed set of filters, a specific projection, a deliberate ranking. For that, write a thin retrieval function in `lib/retrieve.ts` using `createClient` from `next-sanity` and a single GROQ query. This is also where the structure-first, semantics-last discipline becomes concrete.

GROQ lets you put the structural predicates inside the `*[ ... ]` filter, then rank what survives. The structural filter (publish state, date range, document type) runs first and is exact. Only the documents that pass get scored. Semantic similarity is one ingredient in that score, combined with keyword match, not the whole retrieval. This matters because in real Sanity Context production traffic the heavy majority of calls are structured GROQ queries and schema lookups; semantic embeddings are opt-in, off by default, and most projects never turn them on. Reach for hybrid scoring only when structural retrieval alone leaves the agent guessing.

The code below is the canonical hybrid shape. The filter narrows to published articles in a category. `score()` combines a keyword `match` and `text::semanticSimilarity()` into a `_score`, and `order(_score desc)` ranks the survivors. The query text, not an embedding vector, goes into both functions; the embedding is resolved server-side when semantic search is enabled on the dataset.

A typed GROQ retrieval function with hybrid scoring

Structure first inside the filter, semantics last inside score().

// lib/retrieve.ts
import { createClient } from 'next-sanity'

const client = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: 'production',
  apiVersion: '2024-10-01',
  useCdn: false,
})

export async function retrieveContext(question: string) {
  // Structural predicates filter FIRST; ranking applies to survivors.
  const query = `*[
    _type == "article" &&
    !(_id in path("drafts.**")) &&
    category->slug.current == $category
  ]
  | score(
      boost(title match text::query($q), 2),
      text::semanticSimilarity($q)
    )
  | order(_score desc)[0...5]{ title, body, _score }`

  return client.fetch(query, { q: question, category: 'pricing' })
}

Revalidate cleanly: keep agent answers current without a redeploy

Now close the loop on the original symptom. You cached the article retrieval by tag in section one. When an editor updates content in the Studio, you want that change to reach your agent route without waiting for the next deploy. Two mechanisms make this clean in Next.js, and they map to how Sanity publishes.

For cached reads, wire a webhook from Sanity to a Next.js route that calls `revalidateTag`. When a document publishes, the webhook fires, the tag's cache entry is invalidated, and the next agent request rebuilds it from fresh content. This keeps the speed of caching for the common case while guaranteeing the agent never quotes content older than the last publish.

For reads that must always be current, the Live Content API streams updates so a server component or route reflects published changes in real time, no redeploy and no manual revalidation. The deeper point is governance. Because the content your agent reads lives in Sanity, an editor reviews and approves it through Content Releases before it goes live, and your route only ever serves the approved version. Sanity is the Content Operating System for the AI era, the intelligent backend that keeps the content an agent reads versioned, governed, and previewable, so the answer your Next.js route streams is one a human signed off on, not whatever happened to be in a vector index at build time.

Revalidate on publish via a Sanity webhook

On publish, invalidate the cache tag so the next agent request reads fresh content.

// app/api/revalidate/route.ts
import { revalidateTag } from 'next/cache'
import { parseBody } from 'next-sanity/webhook'

type WebhookPayload = { _type: string }

export async function POST(req: Request) {
  const { isValidSignature, body } = await parseBody<WebhookPayload>(
    req,
    process.env.SANITY_REVALIDATE_SECRET!,
  )

  if (!isValidSignature) {
    return new Response('Invalid signature', { status: 401 })
  }

  if (body?._type === 'article') {
    revalidateTag('articles')
  }

  return Response.json({ revalidated: true })
}