March 9, 2025

Model Routing: The Easiest Way to Cut Your Agent Bill in Half

Different tasks need different models. Heartbeats on Haiku, triage on Sonnet, complex reasoning on Opus. Walk through a real config and see before/after costs.

Your agent is running the same model for everything. That is the problem.

Heartbeat check at 3am? Opus. Quick calendar lookup? Opus. Sub-agent that converts a CSV? Opus. The model does not matter for most of what your agent does. The price tag does.

Model routing fixes this. One config setting. One change. Often 50% off your bill.

The model stack

You have three tiers to work with:

Claude Haiku 3.5: $0.80/MTok input, $4/MTok output. Fast. Cheap. Handles classification, monitoring, routing, simple lookups. If the task is "is this urgent? yes or no," Haiku answers it for fractions of a cent.

Claude Sonnet 4: $3/MTok input, $15/MTok output. The workhorse. Solid reasoning. Handles email triage, code review, research summaries, content drafts. 5x cheaper than Opus with comparable results on most tasks.

Claude Opus 4: $15/MTok input, $75/MTok output. The heavy gun. Use it for complex reasoning, architectural decisions, nuanced writing. 19x more expensive than Haiku. Earn every token.

The routing principle: use the cheapest model that produces acceptable quality. Not the cheapest model period. The cheapest model that actually works for the task.

What each task actually needs

Heartbeats: Haiku. Every time. A heartbeat checks your task queue, maybe reads HEARTBEAT.md, replies HEARTBEAT_OK or sends a notification. That is a classification task. It requires reading comprehension and a yes/no decision. Haiku does this fine.

Cost difference: Heartbeats run 24-48 times per day per channel. On Opus, that is $151/month per channel. On Haiku, it is $8/month. Same output. 19x price difference.

Email triage: Haiku or Sonnet depending on what you need. If you just want priority scores and bucket labels, Haiku handles it. If you want draft reply suggestions and nuanced context understanding, use Sonnet.

Sub-agents for code: Sonnet. Code review, PR summaries, bug analysis, documentation. Sonnet handles 95% of this at 5x less than Opus. Save Opus for architecture decisions and complex debugging sessions where you actually need the extra reasoning.

Complex reasoning: Opus. Strategic analysis, novel problem solving, tasks where the quality gap actually shows. These are your main conversations, not your sub-agents and monitors.

A real before/after

Here is a common config. One developer. Telegram. 30-min heartbeats. Sub-agents for code tasks.

Before (naive config):

{
  "defaultModel": "anthropic/claude-opus-4-6",
  "heartbeat": {
    "model": "anthropic/claude-opus-4-6",
    "interval": 30
  },
  "routing": {
    "subagent": "anthropic/claude-opus-4-6"
  },
  "channels": [{ "type": "telegram" }]
}

Monthly cost breakdown (50 msgs/day, 3 sub-agents/day, 48 heartbeats/day):

  • Main conversations: $67/month
  • Context loading: $216/month
  • Heartbeats: $151/month
  • Sub-agents: $38/month
  • Total: $472/month

After (routed config):

{
  "defaultModel": "anthropic/claude-opus-4-6",
  "heartbeat": {
    "model": "anthropic/claude-haiku-3-5",
    "interval": 60
  },
  "routing": {
    "subagent": "anthropic/claude-sonnet-4-6"
  },
  "channels": [{ "type": "telegram" }]
}

Changes made: heartbeat to Haiku, heartbeat interval from 30min to 60min, sub-agents to Sonnet.

Monthly cost breakdown:

  • Main conversations: $67/month (unchanged, still Opus)
  • Context loading: $216/month (same Opus default)
  • Heartbeats: $4/month (Haiku, half the frequency)
  • Sub-agents: $7.50/month (Sonnet)
  • Total: $294/month

That is $178/month saved. 38% reduction. Zero change to main conversation quality. Opus is still handling everything you actually care about.

The 2-minute implementation

Open your openclaw.json. Add or update these fields:

"heartbeat": {
  "model": "anthropic/claude-haiku-3-5",
  "interval": 60
},
"routing": {
  "subagent": "anthropic/claude-sonnet-4-6"
}

That is it. Two fields. Sub-10-minute change. The routing engine in OpenClaw reads these settings and uses the specified model for each task type.

If you want more granular control, the routing object accepts additional keys. You can route email tasks, classification tasks, and monitoring tasks independently:

"routing": {
  "subagent": "anthropic/claude-sonnet-4-6",
  "monitoring": "anthropic/claude-haiku-3-5",
  "classification": "anthropic/claude-haiku-3-5"
}

When routing is not enough

Routing fixes the model problem. It does not fix the context problem. If your workspace files are 15,000 tokens, every Haiku heartbeat still loads 15,000 tokens of overhead. You are paying less per token, but you are still paying for every token.

The full optimization stack:

  1. Route heartbeats and sub-agents to cheaper models (saves 50-70% on those costs)
  2. Trim workspace files (saves 30-60% on context loading across all tasks)
  3. Reduce heartbeat frequency if you do not need real-time (saves proportionally)

Each of these compounds. Routing plus trimming plus frequency reduction can cut a $400/month config to $60-80/month without losing any meaningful capability.

Check out the routing recommendations engine to get model suggestions based on your specific workloads. Or paste your current config into the Config Analyzer for exact dollar amounts on each recommendation.

See your actual numbers

The calculator runs in your browser. No account needed.