Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gildea.ai/llms.txt

Use this file to discover all available pages before exploring further.

Entities are named things extracted from signals and resolved to canonical identities. Gildea classifies entities into 8 types across the AI ecosystem:
TypeExamples
organizationNVIDIA, Anthropic, OpenAI
personSam Altman, Dario Amodei, Jensen Huang
modelGPT-5, Claude Opus 4.7, Llama 4
hardwareH100, A100, TPU v5e
locationCountries, regions, cities
eventConferences, product launches
regulation_policyEU AI Act, Executive Order 14110
otherEverything else — products, software, frameworks, datasets, benchmarks, publications, and named technical concepts (e.g., ChatGPT, LangChain, MMLU)
Use type_primary as the authoritative type. The other bucket is a deliberate catch-all: entities that aren’t one of the seven specific types live here rather than in a long tail of sparsely-populated categories. To find a specific product or framework, search by name rather than filtering on type. This means you can track not just which companies are trending, but which models, hardware, and regulations are gaining attention.

Entity extraction and disambiguation

Gildea uses a two-pass entity extraction system to ensure accurate identification:

Pass 1 — Google Natural Language API

Signals are processed through Google Cloud NL to extract entity mentions with Knowledge Graph linking, type classification, and salience scoring.

Pass 2 — Domain-specific disambiguation

A curated rule set disambiguates AI-specific entities that general NLP models struggle with:
  • Model families: “Claude Opus 4.7”, “GPT-5”, “Llama 4” resolve to specific model entries, not generic company mentions
  • Hardware: “H100”, “A100”, “TPU v5e” resolve to specific chips
  • Contextual disambiguation: “Claude” + context mentioning “sonnet” resolves to the model, not a person

Entity identifiers

Every entity has a stable, opaque public ID of the form gld:/<hex> (e.g., gld:/a1b2c3d4e5f6). This is the only entity identifier the API exposes — it appears in every entity_id field, and you pass it back to GET /v1/entities/{name_or_id} or any entity filter. Treat the ID as opaque: it carries no parseable structure and you should not infer an entity’s type from it. Use type_primary as the authoritative entity type. IDs are stable across time — an entity keeps the same gld:/… ID even if its internal classification is later corrected. Behind the scenes, “Meta”, “Meta Platforms”, and “Facebook” all resolve to the same entity (and therefore the same gld:/… ID).

Noise filtering

Generic concepts (“AI”, “market”, “industry”) are excluded and corporate suffixes (“Inc”, “LLC”, “Corp”) are normalized so “NVIDIA Corporation” and “NVIDIA” resolve to the same entity.

Entity profiles

Each entity has a rich profile including:
  • Signal count — total signals mentioning this entity
  • Trend stats — Theil-Sen slope, Mann-Kendall significance, share of voice, volatility
  • Content type mix — breakdown of expert analysis vs. event signals
  • Theme distributions — which value chain segments and market forces this entity appears in
  • Related entities — co-occurrence relationships

Trend analytics

Every entity includes a trend object with analytics across four dimensions:
FieldDimensionDescription
share_of_voiceScaleEntity’s share of total corpus over the trailing 4 weeks
source_diversityScaleCount of distinct source domains
recency_scoreScaleExponential decay from last mention (1.0 = today)
theil_sen_slopeDirectionalityRobust trend slope (resistant to outliers)
streakDirectionalityConsecutive weeks of growth
mk_tauSignificanceMann-Kendall tau (-1 to +1)
mk_p_valueSignificanceStatistical significance of trend (< 0.1 = significant)
coefficient_of_variationVolatilityHow variable is coverage (stddev / mean)

Interpretation fields

Each entity includes interpretation fields derived from the raw trend stats. These are discrete labels that agents can act on without statistical expertise.
FieldValuesWhat it tells you
scaleLarge, Medium, SmallHow prominent is this entity relative to the corpus
directionRising, Stable, Declining, NewWhich way is coverage trending
confidenceSignificant, InsignificantIs the trend statistically reliable
stabilityVolatile, SteadyHow consistent is coverage week to week
notabilityHigh, Medium, Low, NegligibleHow much this entity warrants attention right now — foreground vs. background
notability_reasoningFree textHuman-readable explanation of the notability assignment
All trend statistics (theil_sen_slope, mk_p_value, coefficient_of_variation) are computed over a rolling 12-week window — the current week plus 11 prior weeks. Entities with fewer than 8 mentions in that window return scale only — direction, confidence, stability, and notability will be null. The notability_reasoning field will contain “Insufficient data for trend analysis.” An entity whose first_seen date is within the last 30 days is classified as direction: "New" regardless of its underlying slope — this overrides the Rising/Stable/Declining classification until enough history accumulates. confidence is forced to Insignificant and stability is null for new entities. As a result, filtering ?direction=Rising will not surface genuine breakouts younger than 30 days; use ?direction=New to find recent arrivals.
{
  "entity_id": "gld:/a1b2c3d4e5f6",
  "display_name": "NVIDIA",
  "scale": "Large",
  "direction": "Rising",
  "confidence": "Significant",
  "stability": "Steady",
  "notability": "High",
  "notability_reasoning": "Large-scale entity with confirmed upward trend and consistent coverage; notable upward shift reliably gaining share.",
  "trend": { "..." : "..." }
}

Entity association granularity

Entity ↔ content associations are stored at the article level in entity_article_mentions (one row per (article, canonical_entity) pair, with the matched surface form). There is no persistent per-text-unit entity table. Where unit-level associations appear in responses, they are computed at request time by matching the article’s surface forms against each unit’s text using word-boundary regex (NFC normalization, casefold, possessive stripping). This applies to:
  • GET /v1/signals/{id} — each unit in the decomposition carries an entities array listing the public entity IDs (gld:/…) whose surface forms appear in that unit’s text.
  • GET /v1/search?entity=<id> — the filter pre-scopes the candidate pool to articles mentioning the entity, then the cross-encoder rerank runs, then a unit-level post-filter drops candidates whose text doesn’t literally contain a surface form of the requested entity. Article-level pre-filter alone (without the unit-level post-filter) would surface units from the same article that happen to discuss other entities.
Implications for consumers:
  • High precision when filtering by entity: results literally name the entity, not just sit in an article that does.
  • Coreference is not resolved. A unit saying “the company shipped a 200K-context model” will not match ?entity=gld:/a1b2c3d4e5f6 even when context makes the referent obvious. The fix path is broader surface-form coverage, not pronominal resolution.
  • Aggregate statistics (mv_entity_profile, mv_entity_co_occurrence) currently count at article granularity. If you need unit-level entity counts for trend analytics, that’s a future change tracked in TODO.
  • diagnostics.entity_filter_dropped in search responses reports how many candidates the unit-level post-filter removed, useful for debugging when result counts are tighter than expected.

Composable filtering

The list entities endpoint supports filtering by any combination of interpretation fields. This replaces the old trending modes with a more powerful, composable approach.
FilterValuesReplaces
direction=Rising + confidence=SignificantReliably trending upOld rising mode
direction=New + sort=first_seenRecently tracked entitiesOld emerging mode
stability=Volatile + sort=trendMost variable coverageOld volatile mode
Filters can be combined freely. For example, ?direction=Rising&scale=Large&sort=trend finds high-prominence entities with upward trends.

Co-occurrences

The entity detail endpoint includes related_entities — entities that frequently appear together in signals. This reveals industry relationships, competitive dynamics, and supply chain connections.
{
  "related_entities": [
    {
      "entity_id": "gld:/b2c3d4e5f6a7",
      "name": "TSMC",
      "type": "organization",
      "co_occurrence_count": 28
    }
  ]
}