How to Use Sanity Knowledge Bases as Agent Memory

Most agent "memory" is a bucket of chat transcripts and a vector index that nobody governs. The agent answers a customer about a refund policy that changed last quarter, retrieves the stale version that still lives in the embedding store, and confidently states the wrong terms. Nobody notices until the support ticket escalates, because the memory layer has no owner, no review step, and no relationship to the content the company actually publishes. The failure isn't the model. It's that the memory was a side pipeline divorced from the source of truth.

Sanity Context is the AI Content Operating System for agent memory: an intelligent backend that makes your structured content, not a copy of it, the thing your agents remember. Built on Sanity's Content Operating System for the AI era, it turns datasets, documentation, PDFs, and support databases into Knowledge Bases that agents query through one governed retrieval path.

This guide reframes agent memory as a content problem, not an infrastructure problem. We will cover what belongs in long-term versus working memory, how Knowledge Bases keep that memory fresh, how editors govern it in the Studio, and where Sanity Context fits against vector databases and content backends with AI bolted on.

Why agent memory rots, and why it's a content problem

Teams reach for a vector database the moment an agent needs to recall anything, and that instinct is where the rot starts. The standard pattern is a separate ingestion pipeline: scrape the docs, chunk them, embed them, push the vectors to a store, and schedule a re-sync. Every one of those steps is a place for memory to drift out of step with reality. The product page changes, but the embedding job runs nightly, so for up to twenty-four hours the agent remembers a price that no longer exists. A legal clause gets revised, but the chunk boundaries split it awkwardly, so the agent retrieves half a sentence and fills in the rest from its training data. Nobody owns the pipeline, so when retrieval quality degrades there is no editor to call.

The deeper issue is that this architecture treats memory as infrastructure when it is really content. What an agent should remember is the same material your humans rely on: the current policy, the shipping product spec, the support macro your team approved last week. When memory lives in a copy, you maintain two versions of the truth and reconcile them forever. When memory lives in the content itself, freshness is a property of publishing, not a cron job.

This is the reframe the rest of the guide builds on. Long-term agent memory should be your governed content store, queried directly, with embeddings that move when the content moves. Sanity Context is built on exactly that premise: the Content Lake is both where editors work and where agents read, so there is no second copy to drift.

Long-term memory versus working memory: drawing the line

Not everything an agent touches deserves to be remembered the same way, and conflating the two kinds of memory is a common cause of bad answers. Working memory is the scratchpad of a single conversation: what the user just asked, the tool results from the last three turns, the half-finished plan the agent is executing. It is ephemeral by design, and trying to persist all of it pollutes long-term recall with one customer's transient context. Long-term memory is the durable knowledge the agent should carry across every conversation: product facts, policies, documentation, resolved support patterns. This is institutional knowledge, and it has the same lifecycle as published content, it gets drafted, reviewed, approved, superseded, and retired.

The practical rule is that long-term memory belongs in a governed store with an owner, and working memory belongs in the session. When you map long-term memory onto your content backend, you inherit everything that backend already does well: versioning, references between documents, structured fields that let you retrieve a policy by region rather than by fuzzy text match.

With Sanity Context, that durable layer is a Knowledge Base built from datasets, websites, PDFs, and support databases, turned into agent-readable documents that share one retrieval path. Because the documents are modeled, not just chunked, the agent can ask for the support article tagged for a specific product line and a specific region, and get a precise document rather than the statistically nearest paragraph. Working memory stays in the agent runtime; long-term memory stays in content where it can be governed.

Illustration for How to Use Sanity Knowledge Bases as Agent Memory

How Knowledge Bases keep memory fresh without a vector pipeline

The single biggest operational tax in a homegrown memory stack is keeping the embeddings honest. You write content in one system, then run a separate job to extract, chunk, embed, and upsert it into a vector store. That pipeline is fragile, it lags reality, and it is the thing that pages your on-call engineer at 2 a.m. when a re-index fails silently and retrieval quality quietly tanks.

Knowledge Bases collapse that pipeline. Because embeddings in Sanity Context are dataset embeddings, tied directly to the content rather than to a downstream copy, an edit to a document propagates to what the agent can retrieve within minutes. There is no separate vector pipeline to build, monitor, or reconcile. When an editor corrects a policy in the Studio and publishes, the corrected text is what the agent remembers on the next query, not the version from last night's batch job.

This matters most for the content that changes often and matters most: pricing, availability, policy language, the support answers that shift as a product evolves. A memory layer that updates in minutes instead of overnight is the difference between an agent that reflects your business and one that argues with it. It also removes an entire class of incident, the stale-embedding bug, where the content is correct everywhere a human looks but wrong in the one place the agent reads. The freshness is a consequence of architecture: memory and content are the same object, so they cannot disagree.

Retrieving from memory: hybrid search inside the content store

Recall quality is where most agent-memory setups quietly fail. Pure vector similarity is good at fuzzy, conceptual matches and bad at exact ones, it will happily return a semantically related paragraph while missing the document that contains the exact SKU or error code the user typed. Pure keyword search has the opposite failure. Production agent memory needs both, blended and tuned, and most stacks achieve that by bolting a keyword engine onto a vector database and writing glue code to merge two ranked lists.

Sanity Context does the blend natively inside the content store. In a single GROQ query you combine `text::semanticSimilarity()` for conceptual recall with a BM25-style `match()` for exact terms, then shape the final ranking with `score()` and `boost()` so that, for example, an exact product-code hit outranks a loose thematic match. There is no second system to keep in sync, no two indexes that can disagree, and no merge logic to maintain in application code.

The consequence for memory is precision. When a customer pastes an exact error string, `match()` surfaces the document that contains it; when they describe a problem in their own words, `text::semanticSimilarity()` finds the conceptually right answer; and `boost()` lets you encode editorial judgment about which signal should win in a tie. Because all of this runs against the Content Lake, the same governed store that holds the content, the agent's memory and the company's published truth are queried through one path rather than reconciled across two.

Governing what the agent remembers in the Studio

An ungoverned memory layer is a liability waiting to happen. If anyone can push a document into the embedding store and the agent will start citing it, you have no review step between a draft idea and a customer-facing answer. The vector-database model makes this worse, because the people who maintain the index are usually engineers, not the editors and subject-matter experts who actually know whether a policy statement is correct and approved.

Treating long-term memory as content puts it back under editorial control. In the Studio, the same people who own the website own what the agent remembers. They draft changes, route them through review, and see the content in context before it goes live. Content Releases let a team stage a batch of memory changes, a new product launch's worth of facts, for example, and ship them together at a known time, the same way they would stage the website, rather than dribbling untracked edits into a search index.

This is also where agent instructions themselves belong: not hardcoded in a deployment script, but staged and reviewed like any other content change, so a behavioral tweak gets the same scrutiny as a policy update. For regulated buyers, the governance story is concrete. Sanity is SOC 2 Type II compliant, supports GDPR obligations, offers regional hosting and data residency options, and publishes its sub-processor list, so the system that holds your agent's memory meets the controls your security team already asks for. Memory becomes auditable: who changed what the agent knows, when, and with whose approval.

Connecting agents to memory: the MCP endpoint

A memory layer is only useful if your agents can actually reach it without each team reinventing the connection. The common failure here is integration sprawl: every agent framework gets its own bespoke client, its own auth handling, its own assumptions about how to query the store, and the result is a dozen brittle integrations that all have to be updated when the schema changes.

Production agents connect to Sanity Context through the Sanity Context MCP endpoint, a single governed interface shaped to the product rather than a raw database connection. Because it speaks the Model Context Protocol, the agents and assistants that already support MCP can query Knowledge Bases through the same retrieval path described earlier, hybrid GROQ against the Content Lake, without each team building its own retrieval logic. The endpoint is the contract: agents ask for content, the endpoint applies the governed query path, and the same freshness and precision guarantees hold no matter which framework made the call.

When you need the agent to write back, not just read, Agent Actions provide schema-aware APIs for LLM-driven workflows like generate, transform, and translate, so an agent can propose a content change that still flows through the same modeled, reviewable structure rather than mutating a raw blob. The point of consolidating on one endpoint is operational: there is one place to govern access, one retrieval path to reason about, and one schema that every agent shares, instead of a sprawl of integrations that drift apart. Memory stops being a per-team science project and becomes shared infrastructure.

Putting it together: an agent-memory architecture that doesn't drift

Pulling the threads together gives you a memory architecture with a single defining property: there is no second copy of the truth to keep in sync. Long-term memory is your governed content, modeled as documents in the Content Lake. Freshness is a property of publishing, because dataset embeddings move when the content moves, not on a batch schedule. Retrieval is hybrid and native, blending `text::semanticSimilarity()` and `match()` in one GROQ query so both fuzzy and exact recall work without a second index. Governance lives in the Studio, where editors review and stage memory changes through Content Releases the same way they stage a site. And access is consolidated behind the Sanity Context MCP endpoint, so every agent reads through one path.

Contrast that with the assembled alternative. A vector database plus a content backend plus a keyword engine plus an ingestion pipeline plus glue code is five systems, each with its own freshness lag, its own owner or lack of one, and its own way to disagree with the others. Every seam is a place memory can drift from reality, and drift is exactly the failure mode that makes agents hallucinate against content that is correct everywhere a human looks.

This is what it means to call Sanity Context an intelligent backend for companies building AI content operations at scale. It is not a headless CMS with a search add-on and it is not a vector store you maintain on the side. It is the Content Operating System that legacy stacks force you to assemble, delivered as one governed foundation where memory, content, and the agents that read them finally share the same source of truth.

Agent memory approaches compared

Feature	Sanity	Pinecone	Contentful	pgvector / Neon
Hybrid retrieval (semantic + keyword)	Native: text::semanticSimilarity() + match() blended and ranked with score() and boost() in one GROQ query against the Content Lake.	Sparse-dense vectors support hybrid, but keyword and metadata logic is assembled in application code, not one query.	No native vector search; teams pair the App Framework with an external search service and merge results themselves.	pgvector gives similarity in SQL; full-text via tsvector is separate, so hybrid ranking is hand-built and tuned by you.
Keeping memory fresh	Dataset embeddings are tied to content, so an edit propagates to retrieval within minutes; no separate vector pipeline to run.	Requires a separate ingestion and re-embed pipeline you build and monitor; freshness depends on your sync cadence.	Content is fresh in the CMS, but any vector copy lives in an external store you re-sync on your own schedule.	You write the extract, chunk, embed, and upsert pipeline; stale embeddings are an incident class you own.
Editorial governance of memory	Editors review and stage memory in the Studio; Content Releases ship batches of changes together, the same way they stage a site.	Vector index is engineer-maintained infrastructure; no editorial review layer between a change and what the agent recalls.	Strong editorial workflows for content, but governance does not extend to the external vector copy agents query.	Database table with no built-in review; governance is whatever you build on top of Postgres yourself.
Structured retrieval by field	Documents are modeled, so agents fetch by region, product line, or status rather than the statistically nearest chunk.	Metadata filters narrow vector results, but structure is flat key-value, not a referenced content model.	Rich content modeling, but retrieval for agents runs through the bolted-on search layer, not the model directly.	Relational schema supports structured filters; combining them with vector ranking is custom SQL you maintain.
Agent connection	Production agents connect through the Sanity Context MCP endpoint; one governed retrieval path shared across frameworks.	REST and SDK clients; each agent framework integrates its own retrieval logic and auth handling.	Delivery APIs plus a separate search API; agents wire up both and reconcile the responses.	Direct database connection; every agent team builds and secures its own query layer.
Compliance posture	SOC 2 Type II, GDPR support, regional hosting and data residency options, and a published sub-processor list.	SOC 2 Type II and GDPR; compliance covers the vector store, with content and governance handled elsewhere in your stack.	SOC 2 Type II and GDPR for the CMS; the external search and vector layer carry their own separate posture.	Inherits the posture of your Postgres host and cloud account; compliance scope is whatever you assemble and attest to.