Documentation Index
Fetch the complete documentation index at: https://docs.gildea.ai/llms.txt
Use this file to discover all available pages before exploring further.
Entities are named things extracted from signals and resolved to canonical identities. Gildea classifies entities into 8 types across the AI ecosystem:
| Type | Examples |
|---|
organization | NVIDIA, Anthropic, OpenAI |
person | Sam Altman, Dario Amodei, Jensen Huang |
model | GPT-5, Claude Opus 4.7, Llama 4 |
hardware | H100, A100, TPU v5e |
location | Countries, regions, cities |
event | Conferences, product launches |
regulation_policy | EU AI Act, Executive Order 14110 |
other | Everything else — products, software, frameworks, datasets, benchmarks, publications, and named technical concepts (e.g., ChatGPT, LangChain, MMLU) |
Use type_primary as the authoritative type. The other bucket is a deliberate catch-all: entities that aren’t one of the seven specific types live here rather than in a long tail of sparsely-populated categories. To find a specific product or framework, search by name rather than filtering on type.
This means you can track not just which companies are trending, but which models, hardware, and regulations are gaining attention.
Entity extraction and disambiguation
Gildea uses a two-pass entity extraction system to ensure accurate identification:
Pass 1 — Google Natural Language API
Signals are processed through Google Cloud NL to extract entity mentions with Knowledge Graph linking, type classification, and salience scoring.
Pass 2 — Domain-specific disambiguation
A curated rule set disambiguates AI-specific entities that general NLP models struggle with:
- Model families: “Claude Opus 4.7”, “GPT-5”, “Llama 4” resolve to specific model entries, not generic company mentions
- Hardware: “H100”, “A100”, “TPU v5e” resolve to specific chips
- Contextual disambiguation: “Claude” + context mentioning “sonnet” resolves to the model, not a person
Entity identifiers
Every entity has a stable, opaque public ID of the form gld:/<hex> (e.g., gld:/a1b2c3d4e5f6). This is the only entity identifier the API exposes — it appears in every entity_id field, and you pass it back to GET /v1/entities/{name_or_id} or any entity filter.
Treat the ID as opaque: it carries no parseable structure and you should not infer an entity’s type from it. Use type_primary as the authoritative entity type. IDs are stable across time — an entity keeps the same gld:/… ID even if its internal classification is later corrected.
Behind the scenes, “Meta”, “Meta Platforms”, and “Facebook” all resolve to the same entity (and therefore the same gld:/… ID).
Noise filtering
Generic concepts (“AI”, “market”, “industry”) are excluded and corporate suffixes (“Inc”, “LLC”, “Corp”) are normalized so “NVIDIA Corporation” and “NVIDIA” resolve to the same entity.
Entity profiles
Each entity has a rich profile including:
- Signal count — total signals mentioning this entity
- Trend stats — Theil-Sen slope, Mann-Kendall significance, share of voice, volatility
- Content type mix — breakdown of expert analysis vs. event signals
- Theme distributions — which value chain segments and market forces this entity appears in
- Related entities — co-occurrence relationships
Trend analytics
Every entity includes a trend object with analytics across four dimensions:
| Field | Dimension | Description |
|---|
share_of_voice | Scale | Entity’s share of total corpus over the trailing 4 weeks |
source_diversity | Scale | Count of distinct source domains |
recency_score | Scale | Exponential decay from last mention (1.0 = today) |
theil_sen_slope | Directionality | Robust trend slope (resistant to outliers) |
streak | Directionality | Consecutive weeks of growth |
mk_tau | Significance | Mann-Kendall tau (-1 to +1) |
mk_p_value | Significance | Statistical significance of trend (< 0.1 = significant) |
coefficient_of_variation | Volatility | How variable is coverage (stddev / mean) |
Interpretation fields
Each entity includes interpretation fields derived from the raw trend stats. These are discrete labels that agents can act on without statistical expertise.
| Field | Values | What it tells you |
|---|
scale | Large, Medium, Small | How prominent is this entity relative to the corpus |
direction | Rising, Stable, Declining, New | Which way is coverage trending |
confidence | Significant, Insignificant | Is the trend statistically reliable |
stability | Volatile, Steady | How consistent is coverage week to week |
notability | High, Medium, Low, Negligible | How much this entity warrants attention right now — foreground vs. background |
notability_reasoning | Free text | Human-readable explanation of the notability assignment |
All trend statistics (theil_sen_slope, mk_p_value, coefficient_of_variation) are computed over a rolling 12-week window — the current week plus 11 prior weeks. Entities with fewer than 8 mentions in that window return scale only — direction, confidence, stability, and notability will be null. The notability_reasoning field will contain “Insufficient data for trend analysis.”
An entity whose first_seen date is within the last 30 days is classified as direction: "New" regardless of its underlying slope — this overrides the Rising/Stable/Declining classification until enough history accumulates. confidence is forced to Insignificant and stability is null for new entities. As a result, filtering ?direction=Rising will not surface genuine breakouts younger than 30 days; use ?direction=New to find recent arrivals.
{
"entity_id": "gld:/a1b2c3d4e5f6",
"display_name": "NVIDIA",
"scale": "Large",
"direction": "Rising",
"confidence": "Significant",
"stability": "Steady",
"notability": "High",
"notability_reasoning": "Large-scale entity with confirmed upward trend and consistent coverage; notable upward shift reliably gaining share.",
"trend": { "..." : "..." }
}
Entity association granularity
Entity ↔ content associations are stored at the article level in entity_article_mentions (one row per (article, canonical_entity) pair, with the matched surface form). There is no persistent per-text-unit entity table.
Where unit-level associations appear in responses, they are computed at request time by matching the article’s surface forms against each unit’s text using word-boundary regex (NFC normalization, casefold, possessive stripping). This applies to:
GET /v1/signals/{id} — each unit in the decomposition carries an entities array listing the public entity IDs (gld:/…) whose surface forms appear in that unit’s text.
GET /v1/search?entity=<id> — the filter pre-scopes the candidate pool to articles mentioning the entity, then the cross-encoder rerank runs, then a unit-level post-filter drops candidates whose text doesn’t literally contain a surface form of the requested entity. Article-level pre-filter alone (without the unit-level post-filter) would surface units from the same article that happen to discuss other entities.
Implications for consumers:
- High precision when filtering by entity: results literally name the entity, not just sit in an article that does.
- Coreference is not resolved. A unit saying “the company shipped a 200K-context model” will not match
?entity=gld:/a1b2c3d4e5f6 even when context makes the referent obvious. The fix path is broader surface-form coverage, not pronominal resolution.
- Aggregate statistics (
mv_entity_profile, mv_entity_co_occurrence) currently count at article granularity. If you need unit-level entity counts for trend analytics, that’s a future change tracked in TODO.
diagnostics.entity_filter_dropped in search responses reports how many candidates the unit-level post-filter removed, useful for debugging when result counts are tighter than expected.
Composable filtering
The list entities endpoint supports filtering by any combination of interpretation fields. This replaces the old trending modes with a more powerful, composable approach.
| Filter | Values | Replaces |
|---|
direction=Rising + confidence=Significant | Reliably trending up | Old rising mode |
direction=New + sort=first_seen | Recently tracked entities | Old emerging mode |
stability=Volatile + sort=trend | Most variable coverage | Old volatile mode |
Filters can be combined freely. For example, ?direction=Rising&scale=Large&sort=trend finds high-prominence entities with upward trends.
Co-occurrences
The entity detail endpoint includes related_entities — entities that frequently appear together in signals. This reveals industry relationships, competitive dynamics, and supply chain connections.
{
"related_entities": [
{
"entity_id": "gld:/b2c3d4e5f6a7",
"name": "TSMC",
"type": "organization",
"co_occurrence_count": 28
}
]
}