Improve accuracy of assistant-ui chat agents with Sanity Context

Your assistant-ui chat renders beautifully. The streaming is smooth, the message bubbles animate, the tool calls show their work. Then a user asks "what's the return window on the jacket I bought last month?" and the assistant confidently invents a 60-day policy. Your real policy is 30 days. The UI did everything right. The model made up the answer because nothing in the loop fed it the actual content.

Sanity Context is Sanity's agent-facing product, and its Context MCP endpoint is what gives the tool calls in your loop somewhere real to read from: live schema, GROQ queries against actual records, no guessing required.

This is the gap nobody warns you about when you adopt assistant-ui. The library is a presentation and runtime layer for chat: it gives you `Thread`, `Composer`, generative UI, and a clean `useAssistantRuntime` contract. What it deliberately does not give you is a source of truth. The runtime calls your backend, your backend calls a model, and if that model is reasoning over a stale JSON dump or nothing at all, your gorgeous UI becomes a confident liar.

This article is about closing that gap. We will wire a custom runtime, stream tool results into generative UI components, and then make those tools return real, current, governed content instead of hallucinations. The retrieval layer is where assistant-ui apps quietly fall over in production, so that is where we will spend the most time.

The runtime is your contract, not your content

assistant-ui's core abstraction is the runtime. You either use a prebuilt adapter like `useChatRuntime` (which speaks the Vercel AI SDK protocol) or you implement `useExternalStoreRuntime` / a custom `ChatModelAdapter` to bridge to whatever backend you have. The library handles message state, streaming, edit-and-resend, branching, and cancellation. That is a lot of correctness you do not have to write.

The trap is reading that list and concluding the runtime is doing more than it is. It moves messages. It does not know anything about your domain. When you implement a `ChatModelAdapter`, the `run` method hands you the message history and expects you to yield streamed chunks back. Everything between receiving those messages and producing an answer, the model call, the system prompt, the retrieval, is yours to get right.

So the first design decision is honest separation. The runtime owns transport and UI state. A separate function owns 'what does the model actually know when it answers this turn.' Conflate them and you end up stuffing a giant static context string into your system prompt, which bloats every request, goes stale the moment an editor changes a price, and still misses the one document the user asked about. Keep the adapter thin and push the knowledge problem into tools the model can call on demand.

A minimal custom ChatModelAdapter

The adapter streams text back to the Thread. It owns transport, not knowledge.

import { useLocalRuntime, type ChatModelAdapter } from "@assistant-ui/react";

const adapter: ChatModelAdapter = {
  async *run({ messages, abortSignal }) {
    const res = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages }),
      signal: abortSignal,
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
    let text = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      text += decoder.decode(value, { stream: true });
      yield { content: [{ type: "text", text }] };
    }
  },
};

export function Runtime({ children }: { children: React.ReactNode }) {
  const runtime = useLocalRuntime(adapter);
  return <AssistantRuntimeProvider runtime={runtime}>{children}</AssistantRuntimeProvider>;
}

Generative UI is only as good as the data behind it

The feature that sells teams on assistant-ui is generative UI: register a component with `makeAssistantToolUI` and when the model calls a matching tool, the library renders your React component inline in the thread instead of raw text. A product lookup becomes a card with an image, price, and add-to-cart button. A flight search becomes a sortable table. This is genuinely the right pattern for agent interfaces.

But a generative UI component is a contract about shape, not truth. `makeAssistantToolUI` binds a `toolName` to a renderer and gives you `args` and `result`. If `result` contains a hallucinated price, your beautiful card displays a hallucinated price with total confidence. The visual polish actually makes it worse, because users trust a structured card more than they trust a paragraph.

The fix is to make the tool that produces `result` query real data, and to render directly from that result rather than from any prose the model wrote around it. When the model calls `getProduct`, the args go to your backend, your backend fetches the actual product, and the actual fields, not the model's recollection of them, populate the card. The model decides which product and when to show it. The data decides what the card says. That split is what keeps generative UI honest at scale.

⚠️

Polished cards amplify wrong data

Users apply more trust to a structured generative-UI card than to a plain-text answer. If the tool feeding that card returns stale or fabricated fields, your UI improvements make the error more convincing, not less. Always render the card from a freshly retrieved result, never from values the model restated in its message.

Rendering a product card from tool results

The card renders from result fields, not from the model's prose. The data is the source of truth.

import { makeAssistantToolUI } from "@assistant-ui/react";

type ProductArgs = { sku: string };
type ProductResult = {
  title: string;
  price: number;
  imageUrl: string;
  inStock: boolean;
};

export const ProductCardUI = makeAssistantToolUI<ProductArgs, ProductResult>({
  toolName: "getProduct",
  render: ({ args, result, status }) => {
    if (status.type === "running") return <div>Looking up {args.sku}…</div>;
    if (!result) return <div>No product found.</div>;

    return (
      <article className="product-card">
        <img src={result.imageUrl} alt={result.title} />
        <h3>{result.title}</h3>
        <p>${result.price.toFixed(2)}</p>
        <button disabled={!result.inStock}>
          {result.inStock ? "Add to cart" : "Out of stock"}
        </button>
      </article>
    );
  },
});

Where the hallucination actually enters

Trace a wrong answer in an assistant-ui app and you will almost always find the failure upstream of the model. The runtime delivered the messages correctly. The model reasoned correctly over what it was given. The problem is what it was given: either no relevant content in context, or a snapshot that drifted out of date.

Three patterns cause most of it. First, the static-context dump: someone serializes the whole catalog or a knowledge base into the system prompt at build time. It is stale within hours and blows your token budget. Second, the naive vector lookup: a single `pinecone.query()` on the embedded user message. This works in the demo and fails the moment the question has structure, 'returns for orders placed after March that are still pending,' because a date range and a status filter are not things cosine similarity resolves. The embedding finds documents that sound like returns, not the documents that match the predicate. Third, no retrieval at all, just vibes and a hopeful system prompt.

The through-line is that good retrieval is not 'add a vector database.' Vector search is one ingredient. Most real questions an assistant fields have a structural component, an entity, a date, a state, a relationship, sitting alongside the fuzzy semantic part. You need both resolved in the same lookup, and you need the result to be current, not a nightly export. That is the actual job your tools have to do.

ℹ️

Most 'model' bugs are retrieval bugs

When you instrument an assistant-ui app with tracing, log the tool's input query and its returned documents next to the model's final message. The majority of confident-wrong answers trace to the tool handing back the wrong document, not the model misreading a correct one. Fix retrieval before you fix prompts.

Tools that return current, structured content

Here is where Sanity Context earns its place in an assistant-ui stack. Your generative-UI tools need a backend that resolves both the structural part of a query (which product variant, which publication state, which date range) and the semantic part (what the user actually meant) and returns content that is live, not a stale export.

For structured content, a catalog, articles with a schema, policy documents modeled as fields, the right surface is GROQ retrieval. GROQ lets you express the structural predicates as plain filters inside the query and, when the question genuinely needs fuzzy matching, layer semantic scoring on top in the same query. The honest default here matters: most calls in production are structured queries and schema lookups, not embedding search. Embeddings are opt-in and off by default, and plenty of assistant-ui apps never need to turn them on. Reach for semantic scoring when exact filters under-recall, not as the first move.

When the question is structural plus fuzzy at once, GROQ resolves both in one pass. The filter inside `*[ ... ]` does the structural work. `text::semanticSimilarity()` and `text::query()` combined under `score()` do the fuzzy work, ordered by `_score`. One query, one round trip, no separate vector store to keep in sync with your CMS.

Hybrid retrieval in a single GROQ query

Structural predicates live in the filter; BM25 and semantic scoring combine under score() and order by _score.

*[_type == "product" && inStock == true && category == $category]
  | score(
      boost(title match text::query($queryText), 3),
      text::semanticSimilarity($queryText)
    )
  | order(_score desc)[0...5]{
    title, price, sku, "imageUrl": image.asset->url, inStock
  }

Wiring the MCP endpoint into your chat backend

The fastest way to give your assistant-ui tools that retrieval layer is the Context MCP endpoint. Sanity Context ships a hosted, read-only MCP server that any agent loop can connect to. Instead of hand-writing a GROQ tool, you attach the MCP server to your model call and the agent gets schema-aware tools out of the box: it can introspect your content model and run queries against live content without you maintaining a separate tool definition per document type.

In an assistant-ui app the MCP connection lives in your `/api/chat` route, the same handler your `ChatModelAdapter` posts to. You connect the MCP client, expose its tools to the model alongside your generative-UI tools like `getProduct`, and the model picks the right one. The read-only constraint is deliberate and worth saying out loud: an agent reading content through MCP cannot mutate your dataset. Writes go through Agent Actions, not MCP, so a chat assistant cannot accidentally edit your catalog.

If you want full control over the exact query, the second path is a thin custom tool that runs a typed GROQ query with `createClient` from `next-sanity`. Same retrieval engine, more explicit. Teams usually start on MCP for speed and drop to a custom tool only for the few queries where they need to hand-tune the projection or scoring.

✨

The MCP endpoint is the fastest way in

Attaching the hosted Context MCP server to your chat route gives the model schema-aware read tools without a per-type tool definition. It is read-only by design, so a chat assistant can query your content but never mutate it; writes route through Agent Actions instead.

A custom GROQ tool in the chat route

The tool result is exactly the shape ProductCardUI renders. useCdn: false guarantees current data.

import { createClient } from "next-sanity";
import { tool } from "ai";
import { z } from "zod";

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: "production",
  apiVersion: "2024-10-01",
  useCdn: false, // live content, not a cached snapshot
});

export const getProduct = tool({
  description: "Look up a product by SKU from the live catalog.",
  parameters: z.object({ sku: z.string() }),
  execute: async ({ sku }) => {
    return sanity.fetch(
      `*[_type == "product" && sku == $sku][0]{
        title, price, sku, "imageUrl": image.asset->url, inStock
      }`,
      { sku }
    );
  },
});

Unstructured sources and the editorial side of state

Not every question your assistant fields maps to a clean schema. Return policies in a PDF, an old support knowledge base, marketing pages scattered across a website, this is messy, unstructured content, and GROQ filters do not help when there are no fields to filter on. For that corpus, Sanity Context Knowledge Bases is the right surface: it turns those sources, datasets, support databases, websites, PDFs, into well-ordered documents with a clear table of contents the agent can retrieve against. Use GROQ for structured catalog data and Knowledge Bases for the messy long tail. They are different tools for different shapes of content, both reachable from the same chat backend.

There is also a state question assistant-ui developers conflate. Per-user chat history, the ephemeral session your runtime branches and edits, belongs in fast key-value storage like Upstash or Redis. Do not put that in Sanity. What does belong there is the editorial side of state: your system prompts, brand voice, approved canned answers, and the knowledge content itself. Those should be versioned, reviewed by humans, and previewed before they reach users.

That is the institutional point. Sanity is the Content Operating System for the AI era, the governed foundation your agent reads from, where a content editor can update the return policy in the Studio, stage it in a Content Release, preview the answer the assistant will give, and publish it without a deploy. The Live Content API pushes that change to your assistant-ui app without a redeploy. Your runtime never changes; the answer just becomes correct.