Mainstreaminfrastructure

Inference Cost Collapse

Token prices falling 80–95% per year while quality keeps rising.

First observed 2023-05-012,310 signals

Trend Dynamics

Updated daily
Velocity
+64
Maturity
58
Signals
2,310
All-time observed

Definition

Inference cost collapse is the sustained, multi-year reduction in the marginal cost of producing a token of equivalent or higher quality, driven by smaller distilled models, better serving stacks, and accelerator competition.

Why It Matters

When the unit cost of intelligence falls 10x in 12 months, every product priced per-seat is exposed to a per-task competitor that uses the savings to give the work away.

Signals Feeding This Trend

  • Per-million-token API price changes42%
    pricing
  • New inference accelerator launches21%
    hardware
  • Distillation paper releases18%
    research
  • Open-weight model parity benchmarks19%
    benchmark

Companies Involved

  • OpenAI
  • Anthropic
  • Google
  • Together AI
  • Fireworks
  • Groq
  • Cerebras
  • DeepSeek

Timeline

  1. 2023-03

    GPT-3.5 priced at $2.00 / 1M input tokens.

  2. 2024-05

    GPT-4o priced at $5.00 / 1M input tokens — frontier quality at 1/10 prior cost.

  3. 2025-02

    GPT-4-class quality available at <$0.20 / 1M tokens via open weights on Groq/Cerebras.

Predictions

  • 12 monthshigh confidence

    Frontier-class quality drops below $0.05 / 1M input tokens on at least one provider.

Related Trends

Track every signal feeding Inference Cost Collapse

Steek surfaces individual signals the moment they enter the index.

Explore the Signal Index