LLM Providers

Ground Anthropic-powered agents in Sanity Context

Anthropic

Claude's tool-use API lets the model call your functions, but reliability lives in what the tools return: clean schemas, typed results, and content the model can actually trust.

Visit Anthropic

You wired up Claude with `tools`, the model picks the right function, and in your demo it works. Then production traffic hits and Claude starts inventing product IDs that don't exist, calling `get_article` with a slug it half-remembered from training data, or confidently summarizing a "current price" that's three campaigns out of date. The model isn't broken. It's doing exactly what you'd expect when a tool returns a 4,000-line JSON blob with no schema discipline and three different shapes for the same field.

Sanity Context, Sanity's agent-facing product, addresses exactly that failure mode. Its Context MCP endpoint gives Claude structured, schema-aware data to query against, so the model is reasoning over governed content rather than stale training weights.

Tool use with Claude is only as reliable as the boundary between the model and your data. The Messages API gives you a clean contract for *declaring* tools, but it says nothing about what comes back through them. That return payload is where hallucinations are born or prevented.

This article is about the second half of that contract. We'll tighten Claude's tool definitions so the model stops guessing arguments, structure the results so it stops misreading them, and then connect the tools to a content backend that returns governed, schema-shaped data instead of a hand-rolled JSON dump. The model layer matters less than most teams think. The retrieval layer matters more.

Why Claude calls the wrong tool with the wrong arguments

The Messages API tool-use loop is simple: you pass a `tools` array, Claude responds with a `tool_use` block, you execute, you return a `tool_result`. The failure modes hide in the gaps. The most common one is argument hallucination. Claude is asked for "the new running shoes," it calls your `get_product` tool, and it fills in `product_id: "prod_running_shoes"` because that *looks* like an ID, even though your IDs are UUIDs. The tool returns a 404, Claude apologizes, retries with another guess, and you've burned three round trips on a lookup that should have been a search.

The root cause is almost always an underspecified `input_schema`. Claude follows JSON Schema closely, but only the parts you actually write. A tool whose schema says `product_id: { type: "string" }` with no description, no format hint, and no enum invites the model to invent plausible strings. The fix is to make the schema do the talking: describe what a valid ID looks like, mark which fields are required, and split "I know the exact ID" from "I'm searching by name" into two different tools. A `search_products(query)` tool that returns candidates is far more reliable than forcing Claude to conjure an identifier it never saw.

The second failure mode is tool *selection*. Give Claude eight tools with overlapping descriptions and it will pick the wrong one under ambiguity. Descriptions are not documentation for you; they are the model's only routing signal. Write them for the model.

A tool definition that constrains Claude's arguments

Splitting search from fetch stops Claude inventing identifiers.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools: Anthropic.Tool[] = [
  {
    name: "search_products",
    description:
      "Find products by free-text name, category, or attribute. " +
      "Use this whenever you do NOT already have an exact product id. " +
      "Returns up to 5 candidates with their ids.",
    input_schema: {
      type: "object",
      properties: {
        query: {
          type: "string",
          description: "What the user is looking for, in their words.",
        },
      },
      required: ["query"],
    },
  },
  {
    name: "get_product",
    description:
      "Fetch one product by its exact id. Only call this with an id " +
      "returned by search_products, never an id you constructed.",
    input_schema: {
      type: "object",
      properties: {
        product_id: {
          type: "string",
          description: "Exact id, e.g. a UUID from search_products results.",
        },
      },
      required: ["product_id"],
    },
  },
];

const message = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  tools,
  messages: [{ role: "user", content: "Tell me about the new trail runners" }],
});

Run the tool loop so results actually come back into context

Declaring tools is half the job. You also have to run the loop correctly, and this is where a surprising number of integrations leak reliability. When Claude returns `stop_reason: "tool_use"`, the model is paused mid-thought. It expects you to execute every `tool_use` block in that turn, append a single `user` message containing the matching `tool_result` blocks, and call the API again so it can continue. Skip a result, mismatch a `tool_use_id`, or return the result as a plain string instead of a `tool_result` block, and Claude either stalls or fabricates an answer from whatever it can infer.

Two details trip people up. First, a single assistant turn can contain *multiple* `tool_use` blocks. Claude can ask for three lookups at once, and you must return all three results in one `user` message, each keyed by its `tool_use_id`. Return them across separate messages and the API rejects the request. Second, errors are part of the contract. If your tool throws, don't silently drop it. Return a `tool_result` with `is_error: true` and a short, honest message. Claude handles "that product was not found, try searching by name" gracefully. It handles silence by guessing.

The shape of the result content matters as much as its delivery. Claude reads `tool_result` content the same way it reads anything else in context. A tidy, predictable structure is easier for the model to reason over than a deeply nested blob with inconsistent keys. Which is exactly the problem the next two sections solve.

A complete, correct tool-use loop in TypeScript

Every tool_use block gets a matching result in one user message.

async function runConversation(userText: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userText },
  ];

  while (true) {
    const res = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 1024,
      tools,
      messages,
    });

    messages.push({ role: "assistant", content: res.content });

    if (res.stop_reason !== "tool_use") {
      return res.content;
    }

    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of res.content) {
      if (block.type !== "tool_use") continue;
      try {
        const data = await executeTool(block.name, block.input);
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: JSON.stringify(data),
        });
      } catch (err) {
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          is_error: true,
          content: `Lookup failed: ${(err as Error).message}`,
        });
      }
    }

    messages.push({ role: "user", content: toolResults });
  }
}

Force structured output so Claude stops free-styling fields

Even with clean tools, the *final* answer Claude writes can drift. The user asks for a price comparison and Claude returns a chatty paragraph when your UI needs `{ sku, price, currency }`. The instinct is to bolt on a regex parser. Don't. Claude's tool-use mechanism doubles as a structured-output mechanism, and using it that way is far more reliable than parsing prose.

The pattern is to define a tool that represents your *output schema*, then force Claude to use it with `tool_choice`. You're not asking Claude to call an external function. You're asking it to emit a JSON object that conforms to a schema, and the API guarantees the shape. Set `tool_choice: { type: "tool", name: "record_comparison" }` and Claude must respond with a `tool_use` block matching that schema, no preamble, no "Sure, here's the comparison." Pull `block.input` and you have typed data.

This removes an entire class of production bugs. No more "the model wrapped the JSON in a markdown fence again." No more `JSON.parse` throwing on a trailing comma the model improvised. The schema is the contract, and Claude is constrained to it at decode time. Combine it with the input discipline from earlier and both ends of the model's interaction are typed: structured arguments going in, structured results coming out. What's left is the data in the middle, which is still only as trustworthy as the system feeding your tools.

Forcing a typed JSON response with tool_choice

tool_choice turns Claude's tool mechanism into a JSON schema enforcer.

import anthropic

client = anthropic.Anthropic()

comparison_tool = {
    "name": "record_comparison",
    "description": "Record a structured product price comparison.",
    "input_schema": {
        "type": "object",
        "properties": {
            "items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "sku": {"type": "string"},
                        "price": {"type": "number"},
                        "currency": {"type": "string"},
                    },
                    "required": ["sku", "price", "currency"],
                },
            }
        },
        "required": ["items"],
    },
}

msg = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=[comparison_tool],
    tool_choice={"type": "tool", "name": "record_comparison"},
    messages=[{"role": "user", "content": "Compare the two trail runners on price"}],
)

# Guaranteed to be a tool_use block with the schema above
result = next(b.input for b in msg.content if b.type == "tool_use")
print(result["items"])

The real reliability problem is the data your tools return

You can perfect every tool schema and still ship a hallucinating agent, because the model can only be as accurate as what flows back through `tool_result`. If your `get_product` tool reads from a denormalized cache that's a week stale, Claude will report stale prices with total confidence. If your `search_products` tool does a naive `LIKE '%query%'` against a database, it returns nothing for "trail runners" when the catalog calls them "off-road running shoes," and Claude, handed an empty result, fills the gap with a plausible invention.

This is the part teams underinvest in. The model and the framework get the attention; the retrieval layer gets a hand-rolled SQL query and a JSON serializer. But debugging production agents almost always traces the failure back to a bad retrieval, not a bad model. The user asked a question with a structural component, recent, in-stock, by this author, in this locale, and a fuzzy text search couldn't resolve it. Pure keyword matching misses synonyms. Pure semantic search ignores the hard filters. What you actually want is one query that combines structural predicates with text relevance, returns exactly the fields your tool schema promised, and reflects the live state of your content.

That is a content-backend problem, not a model problem. It's where Sanity Context fits: it sits underneath your Claude tools as the structured-content layer, so the data crossing the `tool_result` boundary is governed, current, and shaped to your schema rather than assembled ad hoc from a vector DB plus a CMS sync job.

â„šī¸

Most retrieval is structured, not semantic

Across production Context MCP usage, the heavy majority of agent calls are structured: schema-aware lookups and GROQ queries, with a compressed initial context behind them. Semantic search is a small slice, and embeddings are opt-in and off by default. If your first instinct for "reliable tool results" is a vector database, you're probably starting one layer too deep. Structured retrieval handles most questions; reach for embeddings when the failures justify it.

Wire Claude to Sanity Context over the hosted MCP endpoint

The fastest way to give Claude trustworthy content tools is the Context MCP endpoint. It's a hosted, read-only Model Context Protocol server that exposes your content as schema-aware tools, so you don't write `get_product` or `search_articles` by hand at all. Claude's tool-use loop attaches the MCP server, discovers the available tools, and gets back results already shaped to your content model. The read-only constraint is deliberate: an agent can query and retrieve through MCP, but writes go through Agent Actions, not the MCP endpoint, so a misbehaving agent can't mutate your published content.

For teams that want full control over the query, the second path is a thin custom tool that runs a typed GROQ query directly. GROQ is the query language for the Content Lake, and it's where the structured-plus-text combination lives. The same query can filter on hard predicates, in stock, published, in this category, and rank by text relevance, returning only the fields your Claude tool schema declared. That keeps the `tool_result` payload small and predictable, which is exactly what the model reasons over best.

Under the hood this is the Sanity Content Operating System, the intelligent backend for companies building AI content operations at scale. The point for your Claude integration is narrow and practical: the content your tools return is the same content humans model, edit, and govern in the Studio, versioned and previewable, rather than a separate copy that drifts out of sync the moment someone changes a price.

Attaching the hosted Context MCP server to Claude

MCP is the default path: no hand-written content tools, read-only by design.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Claude connects to the hosted, read-only Context MCP endpoint and
// discovers schema-aware content tools automatically.
const message = await client.beta.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  mcp_servers: [
    {
      type: "url",
      url: "https://mcp.sanity.io/mcp",
      name: "sanity-context",
      authorization_token: process.env.SANITY_MCP_TOKEN,
    },
  ],
  messages: [
    { role: "user", content: "What trail running shoes are in stock under $150?" },
  ],
  betas: ["mcp-client-2026-04-04"],
});

// Claude routes the question to the right content tool, runs a
// structured query server-side, and answers from governed data.

Resolve the question pure search can't: structured plus text in one query

Here is the query shape that solves the synonym-versus-filter problem from earlier. The user's question, "in-stock trail runners under $150," has three components: a structural filter (in stock, price ceiling), a text-relevance component ("trail runners" should also match "off-road running shoes"), and a result shape (just the fields the tool returns). A single GROQ query expresses all three. Structural predicates live inside the `*[ ... ]` filter. Text relevance is computed with `score()` combining `boost()` and `text::semanticSimilarity()`, then ordered by `_score`. The projection at the end returns exactly the fields your Claude tool schema declared, nothing more.

This is the hybrid discipline done correctly: hard predicates do the filtering, BM25 and optional semantic similarity do the ranking, and embeddings stay off until a real failure justifies turning them on. Most of the time the structural filter plus keyword match is enough, and you never touch a vector. When you do need semantic matching for messy synonyms, it slots into the same query rather than becoming a separate Pinecone round trip you have to reconcile.

One routing note. GROQ retrieval is the right tool for structured content: your catalog, your articles, anything with a schema. For unstructured sources, PDFs, support databases, marketing sites, Sanity Context's Knowledge Bases turn that messy corpus into well-ordered documents your agent can query. And for high-volume, machine-generated text that needs no editorial governance, a dedicated vector database still has its place. Not everything belongs in your content backend. The structured, governed content your Claude agent answers from usually does.

✨

One boundary, one source of truth

When your Claude tools read through Sanity Context, the data crossing the tool_result boundary is the same content editors model and govern in the Studio. Change a price in a Content Release, preview it, publish it, and the agent sees the new value through the Live Content API without a redeploy or a cache-sync job. The model layer stays simple. The content stays correct.

Hybrid structured + semantic retrieval in one GROQ query

Filters select; score() ranks. The projection matches your tool schema.

*[
  _type == "product" &&
  inStock == true &&
  price <= 150
]
| score(
    boost(title match text::query($queryText), 3),
    text::semanticSimilarity($queryText)
  )
| order(_score desc)
[0...5]{
  _id,
  title,
  price,
  currency,
  _score
}