April 24, 2026

Claude 3 Haiku is gone. Here is what that actually costs you.

Claude 3 Haiku was the workhorse of cheap agent pipelines. It deprecated in April 2026. If you haven't migrated, you're probably paying more than you think.

Claude 3 Haiku was one of the best cost-performance bargains in the LLM market. At $0.25/$1.25 per million tokens, it was the go-to model for high-frequency, low-stakes agent tasks: summaries, classifications, routing decisions, cheap checks that ran hundreds of times a day.

It deprecated in April 2026. If you are still routing tasks to it, you are either hitting errors or getting silently migrated to a pricier model.

What happened to the price

Claude 3 Haiku's replacement in Anthropic's catalog is Claude Haiku 4.5. The pricing is $0.80/$4.00 per million tokens. That is a 3.2x price increase on input and a 3.2x increase on output.

For most workloads that were running on Haiku, that is not a disaster. Haiku was cheap enough that even 3x is not ruinous. But if you have pipelines that were doing tens of millions of tokens per day on Haiku and you have not noticed the cost change, check your bill.

The other option in Anthropic's catalog for budget tasks is Claude Sonnet 4.5 at $3/$15 per million tokens. It is a better model than Haiku by a significant margin. It is also 12x more expensive on input.

The migration decision

The right answer depends on what you were using Haiku for.

If you were using it for cheap routing and classification, Haiku 4.5 is probably the right move. The quality improvement over Haiku 3 is meaningful for these tasks, and the cost increase is manageable if the tasks are truly lightweight.

If you were using it for anything requiring sustained reasoning, this is actually a good forcing function to evaluate whether those tasks need to run as often as they do. Model deprecations are a good moment to audit task frequency.

If you were using it purely to keep costs low, consider whether open-source alternatives or other providers make more sense for your pipeline. Groq, Together, and Fireworks all offer smaller models at Haiku-comparable prices.

The broader pattern in LLM pricing

Claude 3 Haiku's deprecation is part of a consistent pattern across the LLM market in 2026: providers are retiring their cheapest models as they move up-market. The sub-$0.50/MTok tier is getting hollower as first-party models push toward $1+.

The companies filling that gap are third-party inference providers running open-source models. If your workload is cost-sensitive and latency is manageable, Llama 3, Mistral, and Qwen variants running on Groq or Together are worth evaluating.

The decision is not as simple as "use the cheapest model." Quality thresholds matter. If a cheaper model makes more mistakes on your task, the cost of those mistakes (retries, fallbacks, human review) can exceed the savings on tokens. But if your task is robust to occasional errors and you are running it millions of times, model selection is a real cost lever.

What the current market looks like

For context, the current competitive pricing for production agent workloads:

  • High-end reasoning: GPT-5 at $10/$30, Claude Opus 4.6 at $5/$25
  • Mid-tier: GPT-4.1 at $2/$8, Claude Sonnet 4.5 at $3/$15
  • Budget: Claude Haiku 4.5 at $0.80/$4, Gemini 2.0 Flash at roughly $0.10/$0.40
  • Open source inference: Llama 3.3 70B on Groq at around $0.60/$0.80

The gap between the high-end and budget tiers is 50-100x. For most agent pipelines, routing intelligently across those tiers is a bigger cost lever than any other optimization you can make.

The Clawback calculator can help you model what different routing strategies would cost for your specific usage pattern. Input your token volumes and current model mix and it shows you where the savings are.

See your actual numbers

The calculator runs in your browser. No account needed.