May 27, 2026

Context Window Costs Are Your Biggest Hidden Agent Bill

Most teams optimize the part of their bill they can see: output tokens, model choice. The bigger cost is usually what they are sending on every call. Here is how to find it.

The bill you are not looking at

Output tokens are easy to reason about. You see what the model said. You can tell whether it was too long. Output token optimization is the first thing most teams do.

Input tokens are harder to see. You know roughly what you are sending, but it is easy to underestimate how much that actually costs when you account for everything that gets loaded into context on each call.

Most agent deployments we have seen have input token costs that are 3 to 5 times higher than output token costs. Not because input is more expensive per token (it is usually 5x cheaper), but because agents send far more input tokens than they produce output tokens.

What is actually in your context on each call

For a typical OpenClaw agent call, here is what gets loaded:

  • System prompt and persona: 2,000 to 5,000 tokens
  • Tool definitions: 500 to 2,000 tokens depending on how many tools are registered
  • Conversation history: 0 to 50,000 tokens depending on session length
  • Memory and context files: 1,000 to 10,000 tokens
  • Task-specific context (documents, search results, code): varies widely

A minimal heartbeat call on Claude Sonnet 4.5 might be 6,000 to 8,000 input tokens. A deep work session might be 40,000. The difference is mostly history and loaded context, not what the user typed.

The compounding problem in long sessions

Session context grows with every turn. If you are not truncating or summarizing, input tokens increase with every message. Turn 1 might cost $0.02. Turn 20 might cost $0.40 for the same quality of response, because you are feeding back 20 turns of history each time.

On a 30-turn session on Claude Sonnet at $3/MTok input, uncompressed history can add $8 to $15 to the session cost compared to using a 5-turn sliding window. That is a 3 to 5x cost difference from one architectural choice.

Three things that cut context costs without quality loss

1. Sliding window instead of full history. Keep the last 5 to 10 turns in full, summarize everything older into a compact digest. A good summary covers what the agent knows and what decisions were made, in 300 tokens instead of 3,000. Quality impact: minimal for most tasks.

2. Tool pruning. Most agents register all their tools on every call. If you have 20 tools but a given call only uses 3, you are paying to process 17 tool definitions that are irrelevant. Dynamic tool registration (load only the tools relevant to the current task phase) typically cuts tool-definition overhead by 50 to 70%.

3. Prompt caching. If your system prompt and tool definitions are stable across calls, enable prompt caching. Anthropic charges $0.30/MTok for cached input reads vs $3.00/MTok for uncached. That is a 10x cost difference on the tokens you are sending on every single call. This is the highest-leverage single change most teams can make.

How to find what you are actually sending

The fastest way to find your context cost problems is token-level logging. You need to see: total input tokens per call, how many of those are system prompt vs history vs task context, and which calls are the most expensive.

Most teams do not have this. They know their monthly bill, not which calls drove it.

The Clawback dashboard shows input token breakdown per agent call. In most cases, running it for the first time surfaces 2 or 3 context-loading patterns that explain 50 to 70% of the bill.

See your actual numbers

The calculator runs in your browser. No account needed.