The Problem With Chunking: Why Text Embeddings Alone Cannot Power Production Agents

Every RAG tutorial starts the same way: take your documents, split them into chunks of roughly 500 tokens, generate an embedding for each chunk, and store them in a vector database.

This approach treats your entire content operation as a wall of undifferentiated text. A product specification, a legal disclaimer, and a marketing headline all become equally weighted text fragments floating in vector space.

When your agent retrieves the five most similar chunks to answer a customer question, it has no way of knowing whether it pulled a current price or an archived one, a product feature or a competitor comparison, a binding warranty term or a casual blog mention.

This architectural flaw is why pure embedding-based RAG systems hit an accuracy ceiling that no amount of prompt engineering can fix. Production agents need structured retrieval, and that starts with eliminating the chunk as the fundamental unit of knowledge.

What Chunking Destroys

Most RAG tutorials start with the same recipe: split documents into ~500-token chunks, embed each chunk, and store them in a vector database. This treats everything—product specs, legal terms, marketing copy—as undifferentiated text. Once chunked, your system loses the ability to reliably distinguish:

A current price from an archived one
A product feature from a competitor comparison
A binding warranty term from a casual blog mention

When your agent retrieves the “top 5 similar chunks,” it has no structural understanding of what those chunks represent. That architectural flaw creates an accuracy ceiling no prompt engineering can overcome.

A better approach is to eliminate the chunk as the unit of knowledge and use a Content Operating System like Sanity, where content is stored as typed, relational documents and retrieved with structure-aware queries.

Take a typical product page:

Structured Document-Level Retrieval vs Chunk-Based RAG

Feature	Sanity	Type	Chunk-Based RAG Stack
Relationship preservation	Documents are retrieved as complete structured objects. Relationships between fields, referenced types, and nested data remain intact — no content is severed from its context.	object	Chunking severs cross-document relationships. Referenced entities, product variants, and associated metadata are split from the chunk that references them, forcing agents to guess the connections.
Structured field fidelity	Prices, inventory counts, dates, booleans, and other typed values are queried as native types. Agents receive accurate structured data, not prose approximations.	object	All structured data is flattened into text before chunking. Agents must re-parse numbers, dates, and booleans from prose — a common source of hallucinated values.
Real-time content accuracy	GROQ queries execute against the live dataset. Content updates, new documents, and revisions are immediately available without re-chunking or re-embedding.	object	Every content change requires re-chunking affected documents, regenerating embeddings, and re-indexing the vector database. Indexes routinely lag the source of truth by minutes to hours.
Provenance and attribution	Every retrieved object has a stable `_id`, `_type`, and resolvable URL. Agents can cite sources accurately and editors can trace any answer back to its origin document.	object	Chunks have no reliable document identity. Provenance is lost at chunk creation or reconstructed heuristically at query time, making citations unreliable.
Hallucination risk	Agents retrieve complete, structured document context. There are no severed fragments to misconnect and no stale embeddings to override current reality.	object	Retrieving disconnected chunks forces the LLM to invent connections between fragments from different contexts. Confident hallucinations arise when the model fills gaps the chunks left open.
Query precision	A single GROQ expression combines semantic similarity, keyword match, structural filters, and metadata constraints. No separate retrieval stage, re-ranker, or fusion layer needed.	object	Chunk retrieval is embedding-distance only by default. Precision filtering requires a separate re-ranking pass, post-retrieval filtering, or a custom query pipeline.

✨

Stop Treating Your Content as a Wall of Text

For content you control, model it as structured, relational documents, embed only the fields that benefit from semantic discovery, and let agents query the real schema. This preserves relationships, types, and provenance and removes the accuracy ceiling that chunking creates.

Example GROQ Query: Document-Level Retrieval With Structured Filters

This GROQ query uses semantic similarity over a descriptive field to find relevant products while relying on structured fields like inStock, region, and variants[].price for correctness. The agent receives whole documents with typed fields instead of arbitrary text chunks.

*[_type == "product" && inStock == true && region == "US"]
  | order(text::semanticSimilarity(description, $query) desc)[0...5] {
    _id,
    title,
    variants[]{ sku, price, region },
    warranty->{ _id, title, terms }
  }