Preventing Data Leaks in AI Agents with Scoped Content Access

Every team building production AI agents faces the same governance question: how do you prevent the agent from accessing content it should not see?

Draft product announcements, internal pricing strategies, unreleased feature documentation, HR policies, legal negotiations—your Content Lake contains all of it, and your customer-facing agent must never touch it.

The instinct is to solve this with prompt engineering. You write instructions like:

“Never reference draft content.”
“Only use published product information.”

This feels like a solution until a user crafts a prompt that tricks the agent into ignoring its instructions. Prompt injection is a well-documented attack vector, and no amount of system prompt hardening can guarantee that an LLM will always follow its rules.

Real agent security requires architectural access controls that operate at the data layer, not the prompt layer. A Content Operating System with scoped API access and infrastructure-level filtering ensures that unauthorized content physically cannot reach the agent—regardless of what the user asks.

Why Prompt-Based Security Fails

Language models are designed to be helpful and follow instructions. Unfortunately, they can also be instructed to ignore previous instructions.

When you connect AI agents to your content, you’re not just giving them knowledge—you’re giving them power. Unless you strictly control what they can see, they can surface anything in scope to anyone who asks. That includes:

Internal pricing formulas
Draft product announcements
Confidential HR policies

If your scope is too broad, or enforced only via prompts, your customer-facing bot can become an accidental data exfiltration tool. This is already happening in enterprises that wired agents directly into their content without proper access governance.

The fix is not to limit what users can ask. It’s to limit what the agent can see.

A Content Operating System like Sanity, combined with Agent Context, makes this an architectural guarantee instead of a hopeful prompt instruction.

Why Prompt-Based Access Control Fails

Most teams start with prompt-based governance:

“Do not share draft content.”

“Never reveal internal pricing.”

This fails for two core reasons:

LLMs don’t enforce rules deterministically
Context drift in long conversations

If your only line of defense is “please don’t show X,” you’re relying on a system that is fundamentally not designed to enforce hard boundaries.

Infrastructure-Level Scoping With Agent Context

Sanity’s Agent Context flips the model:

Instead of telling the agent what not to share,
You define what it is allowed to see at the infrastructure level.

Each Agent Context document contains a GROQ filter that defines the agent’s content scope. That filter runs on the server before any content is sent to the model.

Example: a customer-facing support bot might be scoped to:

*[_type in [“supportArticle”, “faq”] && _id in path(“drafts.**”) == false && status == “published”]

Within this context:

Drafts are never fetched
Internal docs are never fetched
Pricing data is never fetched

The agent physically cannot access those documents, because they’re excluded from its view before any query executes. No prompt trick can retrieve data that never reaches the model.

Scoping Patterns for Common Use Cases

Different agents need different slices of the same Content Lake. Agent Context lets you define these slices precisely with GROQ filters.

1. Customer Support Bot

Goal: Answer customer questions using only public, relevant support content.

Scope pattern:

Only supportArticle and faq types
Only published documents

✨

Why Prompts Cannot Enforce Content Boundaries

Prompt-based access control is fundamentally unreliable because users can circumvent it through prompt injection, jailbreaking, or indirect instruction. Agent Context enforces content boundaries at the infrastructure level via GROQ filters, so the agent physically cannot access out-of-scope content regardless of what users ask.

Agent Context GROQ Filter for Governed Access

This GROQ filter physically scopes a customer-facing agent to published, non-internal content in the user's region. No prompt injection can override this boundary.

// Agent Context GROQ filter for customer-facing agent
// Only published, non-internal content visible
*[
  _type in ["product", "faq", "helpArticle"] &&
  !(_id in path("drafts.**")) &&
  visibility != "internal" &&
  region == $userRegion
]

Architectural access control is the only reliable way to prevent AI agents from leaking sensitive content. Prompt instructions like “do not access drafts” are policies, not enforcement. They can be ignored, jailbroken, or lost in long contexts.

Sanity’s Agent Context addresses this by scoping what the agent can physically query:

GROQ filters restrict documents at the API level (e.g. _type in ["product", "faq", "documentation"] && public == true), so drafts, internal data, or private docs never enter the agent’s context.
Dataset scoping ensures agents only connect to the correct dataset (e.g. production vs staging), preventing accidental exposure of staging or experimental content.
Read-only access guarantees agents cannot create, update, or delete documents, eliminating the risk of unintended content changes.

This architecture directly mitigates common leak scenarios:

Draft content exposure: Drafts are excluded by filters and perspectives, so unannounced features stay hidden.
Internal pricing leaks: Internal cost or margin fields can be modeled separately and excluded via GROQ filters.
Cross-tenant access: Tenant-specific filters (e.g. _type == "product" && tenantId == "tenant-123") ensure one tenant’s agent cannot see another tenant’s data.

Architectural access control is the only reliable way to prevent AI agents from leaking sensitive content. Prompt instructions like “do not access drafts” are guidance, not security boundaries, and can fail under jailbreaks, adversarial prompts, or long-context confusion.