Emerginghardware

Edge Inference

Capable models running fully on phones, laptops, and embedded devices.

First observed 2024-06-10521 signals

Trend Dynamics

Updated daily
Velocity
+58
Maturity
24
Signals
521
All-time observed

Definition

Edge inference is the execution of LLM and multimodal model inference entirely on a user-owned device, without round-trips to a hyperscaler datacenter.

Why It Matters

Edge inference changes the privacy, latency, and unit-economics story for any AI feature that today requires a server — and reopens hardware cycles for Apple, Qualcomm, and AMD.

Signals Feeding This Trend

  • On-device model size/quality benchmarks36%
    benchmark
  • NPU TOPS announcements28%
    hardware
  • OS-level AI runtime releases (Apple Intelligence, Copilot+)36%
    release

Companies Involved

  • Apple
  • Qualcomm
  • AMD
  • Microsoft
  • Mistral
  • Meta

Timeline

  1. 2024-06

    Apple Intelligence announced — on-device 3B model.

  2. 2024-Q3

    Copilot+ PCs launch with 40+ TOPS NPU baseline.

Predictions

  • 12 monthshigh confidence

    A 7B-class model runs at >30 tokens/sec on a mainstream consumer phone.

Related Trends

Track every signal feeding Edge Inference

Steek surfaces individual signals the moment they enter the index.

Explore the Signal Index