| Type | Examples |
|---|---|
Organization | NVIDIA, Anthropic, OpenAI |
Person | Sam Altman, Dario Amodei, Jensen Huang |
Model | GPT-4o, Claude 3.5 Sonnet, Llama 3 70B |
Product | ChatGPT, Gemini, Claude Code |
Hardware | H100, A100, TPU v5e |
Dataset | The Pile, CommonCrawl, MMLU |
Benchmark | GPQA, HumanEval, MATH |
Framework | LangChain, vLLM, PyTorch |
Software | GitHub Copilot, Cursor, Replit |
Distribution Channel | HuggingFace Hub, AWS Bedrock, Azure OpenAI |
Regulation/Policy | EU AI Act, Executive Order 14110 |
Publication | Research papers, reports |
Location | Countries, regions, cities |
Event | Conferences, product launches |
Concept | Named technical concepts |
Entity extraction and disambiguation
Gildea uses a two-pass entity extraction system to ensure accurate identification:Pass 1 — Google Natural Language API
Signals are processed through Google Cloud NL to extract entity mentions with Knowledge Graph linking, type classification across 15 types, and salience scoring.Pass 2 — Domain-specific disambiguation
A curated rule set disambiguates AI-specific entities that general NLP models struggle with:- Model families: “Claude 3.5 Sonnet”, “GPT-4o”, “Llama 3 70B” resolve to specific model entries, not generic company mentions
- Hardware: “H100”, “A100”, “TPU v5e” resolve to specific chips
- Contextual disambiguation: “Claude” + context mentioning “sonnet” resolves to the model, not a person
Canonical identity resolution
Every entity gets a stable canonical ID through a priority chain:- Google Knowledge Graph MID — highest confidence, links to Google’s knowledge base
- Curated domain ID — for AI-specific entities (e.g.,
org:/nvidia,mdl:/openai-gpt-4o) - Name-based fallback — for entities without external links
Noise filtering
Generic concepts (“AI”, “market”, “industry”) are excluded and corporate suffixes (“Inc”, “LLC”, “Corp”) are normalized so “NVIDIA Corporation” and “NVIDIA” resolve to the same entity.Entity profiles
Each entity has a rich profile including:- Signal count — total signals mentioning this entity
- Trend stats — Theil-Sen slope, Mann-Kendall significance, share of voice, volatility
- Content type mix — breakdown of expert analysis vs. event signals
- Theme distributions — which value chain segments and market forces this entity appears in
- Related entities — co-occurrence relationships
Trend analytics
Every entity includes atrend object with analytics across four dimensions:
| Field | Dimension | Description |
|---|---|---|
share_of_voice | Scale | Entity’s share of total corpus this week |
source_diversity | Scale | Count of distinct source domains |
recency_score | Scale | Exponential decay from last mention (1.0 = today) |
theil_sen_slope | Directionality | Robust trend slope (resistant to outliers) |
streak | Directionality | Consecutive weeks of growth |
mk_tau | Significance | Mann-Kendall tau (-1 to +1) |
mk_p_value | Significance | Statistical significance of trend (< 0.1 = significant) |
coefficient_of_variation | Volatility | How variable is coverage (stddev / mean) |
Interpretation fields
Each entity includes interpretation fields derived from the raw trend stats. These are discrete labels that agents can act on without statistical expertise.| Field | Values | What it tells you |
|---|---|---|
scale | High, Medium, Low | How prominent is this entity relative to the corpus |
direction | Rising, Stable, Declining, New | Which way is coverage trending |
confidence | Significant, Insignificant | Is the trend statistically reliable |
stability | Volatile, Steady | How consistent is coverage week to week |
priority | High, Medium, Low, Negligible | Combined assessment across all dimensions |
priority_reasoning | Free text | Human-readable explanation of the priority assignment |
scale only — direction, confidence, stability, and priority will be null. The priority_reasoning field will contain “Insufficient data for trend analysis.”
Composable filtering
The list entities endpoint supports filtering by any combination of interpretation fields. This replaces the old trending modes with a more powerful, composable approach.| Filter | Values | Replaces |
|---|---|---|
direction=Rising + confidence=Significant | Reliably trending up | Old rising mode |
direction=New + sort=first_seen | Recently tracked entities | Old emerging mode |
stability=Volatile + sort=trend | Most variable coverage | Old volatile mode |
?direction=Rising&scale=High&sort=trend finds high-prominence entities with upward trends.
Co-occurrences
The entity detail endpoint includesrelated_entities — entities that frequently appear together in signals. This reveals industry relationships, competitive dynamics, and supply chain connections.