Trend Dynamics
Updated dailyDefinition
Inference cost collapse is the sustained, multi-year reduction in the marginal cost of producing a token of equivalent or higher quality, driven by smaller distilled models, better serving stacks, and accelerator competition.
Why It Matters
When the unit cost of intelligence falls 10x in 12 months, every product priced per-seat is exposed to a per-task competitor that uses the savings to give the work away.
Signals Feeding This Trend
- Per-million-token API price changes42%pricing
- New inference accelerator launches21%hardware
- Distillation paper releases18%research
- Open-weight model parity benchmarks19%benchmark
Companies Involved
- OpenAI
- Anthropic
- Together AI
- Fireworks
- Groq
- Cerebras
- DeepSeek
Timeline
- 2023-03
GPT-3.5 priced at $2.00 / 1M input tokens.
- 2024-05
GPT-4o priced at $5.00 / 1M input tokens — frontier quality at 1/10 prior cost.
- 2025-02
GPT-4-class quality available at <$0.20 / 1M tokens via open weights on Groq/Cerebras.
Predictions
- 12 monthshigh confidence
Frontier-class quality drops below $0.05 / 1M input tokens on at least one provider.
Related Trends
Track every signal feeding Inference Cost Collapse
Steek surfaces individual signals the moment they enter the index.