March 14, 2026
5 ways to cut your AI agent costs by 50%
Five concrete changes that cut AI agent API costs by 50% or more. Model routing, context trimming, heartbeat tuning, caching, and batching. With real numbers.
Most AI agent setups are running at 2x to 5x their minimum viable cost. Not because the tasks are expensive, but because of configuration defaults that nobody ever changed.
Here are five changes that each cut costs meaningfully, and together often cut the total bill in half.
1. Route tasks to cheaper models
This is the highest-leverage change you can make. Most agents use one model for everything. That is almost always wrong.
The three-tier stack:
- Haiku ($0.80/MTok input): Classification, monitoring, routing, yes/no decisions. If the output is short and the decision boundary is clear, Haiku handles it.
- Sonnet ($3/MTok input): Email drafts, code review, summaries, data work. The workhorse. 5x cheaper than Opus.
- Opus ($15/MTok input): Complex reasoning, architecture decisions, nuanced generation. Use it sparingly.
The math on heartbeats alone: switching from Opus to Haiku for 48 daily heartbeats saves $143/month per channel. That is one config field.
"heartbeat": {
"model": "anthropic/claude-haiku-3-5"
}
Sub-agents from Opus to Sonnet: 5x cheaper with no meaningful quality drop on most tasks. Another config field.
"routing": {
"subagent": "anthropic/claude-sonnet-4-6"
}
Potential savings: 40-70% on heartbeat and sub-agent costs.
2. Trim context ruthlessly
Every token in your workspace files gets loaded on every turn. MEMORY.md, AGENTS.md, SOUL.md, TOOLS.md. They compound across every message, every heartbeat, every sub-agent.
If your workspace context is 9,600 tokens and you cut it to 5,000 tokens, every single API call gets 48% cheaper on the input side. That includes heartbeats, which makes the savings multiply even harder.
What to cut:
- Anything in MEMORY.md that is historical rather than operational. The agent does not need to know what happened last month to function today.
- Verbose instructions in AGENTS.md. Bullet points over paragraphs. Remove anything the agent consistently ignores.
- Redundant persona info in SOUL.md. Three sentences about tone, not three paragraphs.
- Stale skill descriptions for skills you rarely use.
Check file sizes first:
wc -c ~/.openclaw/workspace/*.md
Anything over 3KB is a candidate for trimming. Potential savings: 20-50% on all context-loading costs.
3. Reduce heartbeat frequency
Every 15 minutes costs 4x more than every hour. For most personal and small-team setups, hourly heartbeats are plenty.
The heartbeat is checking for scheduled tasks and notifications. Unless you are expecting time-critical alerts, the 15-minute interval is a cost you do not need.
Monthly cost comparison (single channel, Haiku heartbeat model):
- Every 15 min: $16/month
- Every 30 min: $8/month
- Every hour: $4/month
Going from every 15 minutes to every hour saves $12/month per channel on Haiku. That gap is much larger if you are still on Opus for heartbeats ($302/month vs $76/month).
"heartbeat": {
"interval": 60
}
Potential savings: 50-75% on heartbeat frequency costs.
4. Cache repeated expensive calls
If your agent processes the same data repeatedly, caching the results eliminates redundant API calls. This applies to:
- Recurring summarization tasks on content that has not changed (daily reports, weekly digests)
- Classification runs on identical or near-identical inputs
- Reference lookups that do not change (documentation, FAQs, company info)
Implementation varies by framework. In LangChain, use SQLiteCache or InMemoryCache. In n8n, add a cache node before expensive LLM nodes. In OpenClaw, store results in workspace memory files and check before re-running.
The exact savings depend on your repetition rate. For workflows with high repetition, caching can reduce LLM calls by 30-80%.
5. Batch low-priority tasks
Some tasks do not need to run on every trigger. Email triage that runs on every incoming message could run every 30 minutes on a batch of messages. Log analysis that runs every minute could run every 15 minutes on accumulated logs.
Batching reduces:
- Context loading overhead (loaded once per batch instead of once per item)
- Total API calls
- Per-item cost by amortizing the fixed context cost over multiple items
Example: processing 20 emails individually vs as a batch.
Individual (20 API calls, 4,000 tokens context each):
- Input: 20 * (4,000 + 500) / 1,000,000 * $3 = $0.27
Batch (1 API call, 4,000 tokens context + 10,000 tokens of 20 emails):
- Input: 1 * (4,000 + 10,000) / 1,000,000 * $3 = $0.042
6x cheaper per email processed. Batching does not work for latency-sensitive tasks, but for anything that can tolerate a delay, it is a significant lever.
Putting it together
A typical agent running all five optimizations:
- Model routing: -50% on heartbeat and sub-agent costs
- Context trimming: -30% on all per-turn costs
- Heartbeat frequency: -50% on heartbeat volume
- Caching: -20% on repeated processing costs
- Batching: -40% on high-volume, low-latency-tolerance tasks
These stack multiplicatively rather than additively. Start with model routing. It takes 2 minutes and the savings show up immediately. Then trim context. Then adjust frequency.
Use the Clawback calculator to model changes before making them. Or paste your current config into the Config Analyzer to get specific recommendations with dollar amounts.
The 50% target is conservative. Users who implement all five changes at a well-configured scale often see 70-80% reductions. The math is not complicated. The changes take an afternoon.
See your actual numbers
The calculator runs in your browser. No account needed.