Every trend ranking on the internet starts with a question the producer hopes nobody asks: where did the numbers come from? This report answers that question for Steek. It is the full description of how a raw, public observation about AI becomes a tracked trend with a velocity score on the live index.
Why signals, not articles
A news article is one observer's compression of an event. A signal is the event itself, captured at the source. Articles introduce two failure modes that signals do not:
- Compression bias. Editorial framing chooses which events even appear in the record.
- Latency. Articles arrive hours to days after the source event, by which point downstream signals (pricing, hiring, regulatory) may already be shifting.
A signal-first index is harder to build but produces a record that can be re-aggregated for any question without re-doing the editorial work.
The four-stage pipeline
Every observation passes through the same four stages.
Stage 1: Ingest
Steek ingests roughly 20+ structured sources — frontier-lab release feeds, arXiv, GitHub, regulator publications (NIST, EU Commission, US AISI, UK AISI), pricing pages, earnings transcripts, and curated technical sources. Each source is owned by a typed adapter that knows how to parse and reject duplicates.
Stage 2: Normalize
Every event is reduced to a typed signal record:
{
type: "release" | "benchmark" | "pricing" | ..., // one of 14
entities: ["OpenAI", "GPT-5"],
topic: "reasoning-models",
weight: 0.7, // base + source modifier
observed_at: "2025-04-12T14:32Z",
source_url: "https://...",
source_authority: 0.9
}The full type vocabulary is documented in The AI Signals Taxonomy.
Stage 3: Cluster
Signals are grouped into trend candidates by three co-occurrence dimensions: shared entities, shared topic, and a 14-day rolling window. A candidate is just a label and a list of signals — it has not yet earned the right to appear on the index.
Stage 4: Promote
A candidate is promoted to a tracked trend when it satisfies all of:
- Signal density above the floor (≥ 8 typed signals in 14 days).
- Entity diversity (signals from ≥ 2 independent entities).
- Positive velocity in two consecutive 7-day windows.
- At least one capability or commercial signal (i.e. not pure speculation).
The full math is on the velocity-model page.
How weights are set
Each signal type has a base weight chosen to reflect how predictive that type has historically been of trend persistence. A pricing-page change, for example, is weighted more heavily than a single research paper, because pricing changes are sticky and require organizational alignment. A regulatory document outranks both, because it locks in optionality across the entire industry.
Weights are then adjusted multiplicatively by:
- Source authority — a primary-source URL outranks a secondary report.
- Recency decay — signals older than the 30-day half-life lose weight smoothly.
- Entity diversity — three signals from one entity are worth less than one signal from each of three entities.
How a trend dies
Steek does not delete trends; trends are decommissioned. A trend is retired when its 30-day velocity stays negative for three consecutive windows and no new signals arrive from new entities. The combination matters: a slowdown is normal, but a slowdown without diversification means the underlying activity has stopped generating new players.
A retired trend remains accessible on Steek with a declining badge so that anyone trying to re-investigate the historical record can do so without losing context.
What this report is not
The Steek methodology is deliberately narrow. It does not attempt to predict outcomes, value securities, or quantify second-order economic effects. It is a system for turning observable events into a queryable record of what is actually happening — a much smaller claim than the prediction work that downstream analysts do on top of it.
The downstream layer — briefings, predictions, decisions — lives in the landscape report and the per-trend pages.
Auditing a claim
Every claim Steek makes is auditable. Pick any trend on the index, open it, and you will find the underlying signals listed with timestamps and source URLs. If the URLs do not justify the claim, the claim should not exist. That is the entire reason the system is built signal-first.
Reading further
For the per-type breakdown of the 14 signal types, see the taxonomy. For the precise velocity formula, see the velocity model. For an example of the full pipeline applied to a calendar year, see The 2025 AI Trends Landscape.