Your agent worked when you tested it against one MCP server. Then you added a second, and a third, and now the model picks `search` from the wrong server, the tool list bloats your system prompt past the point where the LLM reads it carefully, and a single misconfigured stdio transport hangs the whole loop on startup. MCP solved the "how do I expose a tool" problem and quietly handed you a new one: how do I bridge several of these servers into a framework that wasn't designed around them?
One concrete example runs through the rest of this article: Sanity Context, with Context MCP as the read endpoint. It's already a hosted, read-only MCP server exposing schema and GROQ queries, which makes it a useful fixed point for testing the boundary problems P1 describes.
The protocol itself is small. The hard parts live at the boundary, where MCP's transport, capability negotiation, and tool schemas meet your agent's tool-calling loop, its timeout budget, and its idea of what a "tool" is. Get that boundary wrong and you ship an agent that is slow, leaky, or non-deterministic in ways that only show up under real traffic.
This article walks the bridge from the framework side: how MCP transports actually behave, how to namespace and filter tools so the model can still choose well, where the latency hides, and how to keep a read-only data server honest. Then it shows where a hosted, schema-aware MCP endpoint earns its place in the stack.
Transports decide your failure modes before any tool runs
MCP gives you two transports in practice, and the one you pick determines how your agent breaks. The `stdio` transport spawns the server as a child process and talks over its standard input and output. It is great locally: zero network, fast, easy to debug. It is a liability in production, because a child process that crashes on a bad message or blocks on a slow filesystem call takes a chunk of your agent's reliability with it, and you now own process lifecycle, restarts, and stderr scraping.
The streamable HTTP transport (the successor to the old SSE transport) talks to a server over an URL. It survives a server restart, scales horizontally, and lets you attach a remote server you do not run. The cost is that connection setup, auth headers, and retry logic are now yours to get right.
The mistake is connecting in your request path. The handshake (`initialize`, then capability negotiation, then `tools/list`) is not free, and doing it per request adds hundreds of milliseconds before the model sees a single tool. Connect once at startup, hold the client, and reuse it.
Connect an HTTP MCP server once, reuse the client
The handshake is per-connection overhead. Connect at boot and hold the client for the process lifetime.
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
new URL("https://example.com/mcp"),
{ requestInit: { headers: { Authorization: `Bearer ${process.env.MCP_TOKEN}` } } }
);
const client = new Client({ name: "my-agent", version: "1.0.0" });
// Do this ONCE at startup, not per request.
await client.connect(transport);
const { tools } = await client.listTools();
console.log(tools.map((t) => t.name)); // cache theseTool-name collisions and prompt bloat are what actually degrade quality
Two MCP servers both expose a tool called `search`. Your framework flattens every server's tools into one list, hands that list to the model, and now the model has to disambiguate two identically named tools from their descriptions alone. It will sometimes pick wrong. Worse, every tool you attach costs tokens in the system prompt, and an agent wired to five chatty servers can spend a thousand tokens describing tools before the user's question even appears. Past a certain list length, models stop reading tool descriptions carefully and start pattern-matching on names.
The fix is a bridge layer you control, not the raw flattened list. Namespace tools by server (`sanity.query`, `support.search`) so collisions are impossible and the model gets a hint about provenance. Filter to the tools this agent actually needs rather than exposing the server's full surface. If a server offers forty tools and your agent uses four, register four.
Doing this in code also gives you a place to translate MCP's JSON Schema into your framework's tool format, which is rarely a clean one-to-one mapping. Frameworks differ on how they want input schemas, descriptions, and result shapes, and the translation is where subtle bugs (dropped `required` arrays, lost enums) creep in.
More tools is not more capability
Namespace and filter MCP tools into a framework tool registry
Filter to the tools the agent needs, namespace them, and wrap the MCP call in your framework's tool shape.
const allow = new Set(["query", "getDocument", "listTypes", "getSchema"]);
const { tools } = await client.listTools();
const registry = Object.fromEntries(
tools
.filter((t) => allow.has(t.name))
.map((t) => [
`sanity.${t.name}`, // namespace to avoid collisions
{
description: t.description,
parameters: t.inputSchema, // JSON Schema, validate before trusting
execute: async (args: unknown) => {
const res = await client.callTool({ name: t.name, arguments: args as any });
return res.content;
},
},
])
);Latency hides in tool-list refreshes and chained round trips
An MCP tool call is a network round trip, and agents chain them: list documents, then fetch one, then fetch a related one. Each hop is a request-response with the server plus the model deciding what to do next. Three sequential MCP calls before the model can answer is three round trips of latency stacked on top of inference, and users feel it.
Two things help. First, do not re-list tools every turn. Capability lists rarely change within a session, so fetch `tools/list` once at connection time and cache it. Some frameworks re-list defensively on every model call; that is a hidden per-turn cost you can usually delete. If a server supports the `notifications/tools/list_changed` notification, listen for it and refresh only when told.
Second, push work into a single richer call rather than several thin ones. If your data server can answer 'give me this document plus its author plus the three most recent related posts' in one query, that beats three separate tool calls the model has to orchestrate. The protocol does not stop a server from exposing a tool that does real work per call. A server whose tools each return one tiny fact forces the chattiness; a server whose tools return shaped, joined results lets you collapse the chain.
Listen for list_changed instead of polling
Read-only by default keeps an exploring agent from breaking things
An agent that can call tools will call them in orders you did not anticipate, especially during the messy middle of a multi-step task where it is exploring. If every MCP tool it can reach is read-only, the worst case of a confused agent is a wasted query, not a deleted record or a published draft. That property is worth designing for, not assuming.
MCP itself does not enforce read-only; the server does. So when you bridge a data source in, prefer a server that is read-only by construction for the retrieval path, and route any writes through a separate, explicit, audited mechanism that is not in the agent's general tool list. The asymmetry is deliberate: reads are cheap to expose broadly, writes deserve a gate.
This is exactly the shape Sanity Context takes. Its Context MCP is a hosted, read-only endpoint: an agent attaches it as an MCP server and gets schema-aware tools to query content, but it cannot mutate anything through that surface. Writes go through Agent Actions, a separate path, not through MCP. So you can hand an agent broad read access to your content without a confused tool-selection step turning into a destructive one. For a framework developer, that means the blast radius of a bad tool call is bounded by design rather than by your own discipline in wiring permissions.
Wiring a hosted, schema-aware MCP endpoint into the loop
Most of the friction above (process lifecycle, transport retries, tool curation) gets simpler when the server is hosted and schema-aware rather than a local script you maintain. Sanity ships Context MCP as a hosted HTTP endpoint, which the docs call the fastest way in, and it is the default integration path: point your MCP client at the URL, authenticate, and the agent gets tools that already understand your content model.
Schema-aware matters here. A generic data MCP server hands the model raw rows and hopes the descriptions are enough. A schema-aware endpoint knows your document types, their fields, and their relationships, so the tools it exposes (query, fetch a typed document, list the available types) carry that structure into the model's view. The agent does not have to guess the shape of your data.
The second path, for teams that want full query control, is a thin custom tool that runs a typed GROQ query directly. GROQ is Sanity's query language; it lets you filter, join, and project in one expression, which is how you collapse those chained round trips into a single richer call. You wrap that query in your framework's tool format and skip MCP entirely for the queries you want to own end to end. Most teams start with the MCP endpoint and reach for custom GROQ tools only where they need a hand-tuned query.
Attach the hosted Context MCP endpoint to an agent loop
Point the MCP client at the hosted Context endpoint and register its schema-aware, read-only tools through your bridge layer.
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
// Hosted, read-only Context MCP endpoint.
const transport = new StreamableHTTPClientTransport(
new URL("https://mcp.sanity.io/mcp"),
{ requestInit: { headers: { Authorization: `Bearer ${process.env.SANITY_MCP_TOKEN}` } } }
);
const client = new Client({ name: "content-agent", version: "1.0.0" });
await client.connect(transport);
// Schema-aware tools, scoped read-only. Namespace and register as in the
// earlier example, then hand the registry to your agent's tool-calling loop.
const { tools } = await client.listTools();Structured retrieval first, embeddings only when failures demand them
A common mistake when bridging a data source to an agent is to reach straight for vector search: embed everything, expose a semantic-search tool, and call it retrieval. In practice most agent queries have a structural component that embeddings handle badly. 'The latest published pricing page', 'posts by this author from last quarter', 'in-stock variants of this product': those are filters on fields and state, and pure semantic similarity routinely returns the plausible-but-wrong document for them.
Sanity's own production data shows the heavy majority of agent calls through Context are structured: GROQ queries and schema lookups, with semantic search a small slice. Embeddings are opt-in and off by default, and most projects shipping on Context MCP never turn them on. So the right default for a bridged data tool is structured retrieval, predicates and projections, and you reach for semantic ranking only when the structured query genuinely cannot express what the agent needs.
When you do need both, GROQ can combine them in one query: structural filters as predicates, then a `score()` pipeline that blends keyword and semantic signals so the result is ranked by relevance without losing the hard constraints. For unstructured corpora (PDFs, support databases, websites) the routing is different: Sanity Context Knowledge Bases turns that messy material into ordered, retrievable documents, while structured content stays on GROQ retrieval. And for high-volume machine-generated data that needs no editorial governance, a dedicated vector database still has its place. Not everything belongs in one system.
Semantic search is one ingredient, not the whole meal
Hybrid retrieval in a single GROQ query: structural filter plus scoring
Structural predicates stay inside the filter; score() blends a BM25 keyword match with semantic similarity, ordered by _score.
*[_type == "article" && status == "published"]
| score(
boost(title match text::query($queryText), 3),
text::semanticSimilarity($queryText)
)
| order(_score desc)[0...5]{
_id, title, _score
}Governing what the agent retrieves: versioned, previewable content
There is a layer of an agent's behavior that is really content, not code: its instructions, the brand voice it writes in, the approved answers it falls back to, the knowledge it retrieves. Teams often hardcode these as strings in the repo, which means every wording change is a deploy, every reviewer is a developer, and there is no preview of what the agent will say before it says it.
This is where the editorial side of state belongs in a system built for it. Sanity is the Content Operating System for the AI era, the intelligent backend for companies building AI content operations at scale, and the relevant primitive here is that anything the agent reads can be modeled as content: versioned, edited by non-developers, reviewed, and staged. The Content Releases primitive lets you prepare a change to agent instructions or approved responses, preview the agent's behavior against it, and publish it as a unit, without a code deploy. This maps to Sanity's automate-everything pillar: the content that drives the agent moves through the same governed workflow as everything else, instead of living as untracked strings.
Keep the boundary clear, though. Ephemeral per-user chat history (the last ten turns of a conversation) is not editorial content and does not belong here; that is Redis or Upstash territory. Sanity holds the durable, governed material the agent reads: the things a human should be able to version, review, and roll back. The MCP endpoint then exposes exactly that governed content to the agent loop, read-only, so what the model retrieves at run time is the same content a human approved.