The C³ Context Cost Curve

02 . 17 . 2026

A practical framework for context engineering in LLM systems

Most teams building with large language models share the same instinct: If the model makes mistakes, add more context.

Early on, this works. Accuracy improves. Hallucinations drop. Demos get better.

Then something strange happens. Latency spikes. Outputs become inconsistent. Debugging turns into guesswork. And despite adding more context, the model performs worse.

This post introduces a practical framework for understanding why this happens, and provides concrete guidelines for context engineering based on conversations with practitioners building real-world LLM systems.

Context is not free

In most AI systems, context is treated as an unpriced resource. System prompts, conversation history, retrieved documents, user memory, and tool outputs are all stuffed into a single context window with the assumption that the model will sort it out.

In reality, context has real costs:

Token cost (paid on every request)
Latency (often tail latency)
Failure surface area (hallucinations, contradictions, priority inversion)

The most reliable AI systems do not maximize context. They optimize it.

What counts as context?

Context is everything the model sees before it produces an output, including:

System instructions
Output schemas
Retrieved documents
User memory
Conversation history
Tool outputs
Agent scratchpads

The most common mistake teams make is treating all of this as interchangeable text. It is not.

Introducing C³: Cold, Warm, and Hot Context

The C³ framework classifies context based on two properties:

How often it changes
How authoritative it should be

Cold Context

Stable, rarely changing, highest authority

Examples include:

System prompts
Policies
Output schemas
Hard constraints

Cold context defines the rules of the system. If it is wrong, the failure can be catastrophic.

Best Practices:

Keep it minimal and surgical
Prefer schemas and validators over prose
Avoid long explanations and too many examples

Anti-pattern:

Pasting an entire policy document into the system prompt.

Warm Context

Slowly changing, moderately authoritative

Examples include:

User preferences
Long-term memory
Project or workspace metadata
Summarized conversation history

Warm context provides continuity, but it comes with risks:

It grows without bounds
It drifts over time
It is often stale or partially incorrect

Best Practices:

Retrieve it dynamically
Summarize aggressively
Label it as advisory rather than binding
Track freshness and confidence

Anti-pattern:

Appending full conversation history indefinitely.

Hot Context

Ephemeral, volatile, lowest authority

Examples include:

Current user input
Tool outputs
Agent scratchpads
Intermediate reasoning artifacts

Hot context changes on every request and has the highest entropy. It is also the most likely to be wrong.

Best Practices:

Enforce hard token caps
Use structured tool outputs
Automatically expire data
Never blindly trust tool results

Anti-pattern:

Dumping every tool response back into the prompt.

The Context Cost Curve

The key insight behind C³ is that context value grows sub-linearly, while context cost grows linearly or worse.

Early context additions dramatically improve accuracy. Past a certain point, returns diminish. Beyond that point, accuracy often declines.

This happens for several reasons:

Attention dilution
Conflicting signals
Implicit priority inversion
Larger hallucination surface area

Different context types hit this inflection point at different times:

Cold context plateaus early
Warm context later
Hot context very early, often followed by a sharp decline

This relationship is the Context Cost Curve.

The C³ Context Stack

Not all context deserves equal weight. Reliable systems assemble context deliberately, from the highest authority to the lowest:

Task instructions (cold)
Output schemas and constraints (cold)
Retrieved facts (warm)
User or project memory (warm)
Recent conversation (hot)
Tool outputs and scratchpads (hot)

As you move down the stack:

Authority decreases
Volatility increases

A simple rule follows: lower layers should never override higher layers. Most hallucinations are violations of this rule.

Operationalizing C³

Teams that take context seriously do a few simple things consistently:

Tag context blocks as cold, warm, or hot
Log context composition on every request
Track accuracy relative to context size
Enforce explicit token budgets
Continuously summarize or decay warm context

In practice, most reliability issues in LLM systems do not come from model choice or prompt wording. They come from treating context as an unlimited buffer rather than a constrained design surface.

Once teams become explicit about what information deserves to be present, how long it should live, and how much authority it carries, much of the complexity disappears. Context stops being a source of surprises and becomes something you can reason about, measure, and improve.

If you are building an AI product and finding that reliability, latency, or evaluation quality degrades as you scale, I would welcome the chance to compare notes. I spend a lot of time working with teams navigating these exact problems and am always interested in learning from what others are seeing in the wild. Please reach out!

The C³ Context Cost Curve

Context is not free

What counts as context?

Introducing C³: Cold, Warm, and Hot Context

Cold Context

Warm Context

Hot Context

The Context Cost Curve

The C³ Context Stack

Operationalizing C³

Related News