Skip to main content
Someone — a competitor, an analyst, a vendor — makes a claim about the AI market. Before you act on it, you need to know: is this widely corroborated, or is it one person’s opinion? This recipe finds semantically similar verified text units across 500+ expert sources and produces a consensus assessment with source attribution.

Who this is for

  • Investors vetting claims from pitch decks or analyst reports before due diligence
  • Product leaders fact-checking competitive assertions before adjusting strategy
  • Consultants validating client assumptions with independent expert evidence

The pattern

  1. Find or identify a text unit you care about
  2. Use similar_to search to find semantically similar units
  3. Group by source to measure consensus breadth
  4. Synthesize into a confidence assessment

Step 1: Find cross-source consensus

from gildea_sdk import Gildea

client = Gildea()

# 1. Start with a claim you found interesting
# (Get the unit_id from a signal detail or search result)
source_unit_id = "clm_01JABCDEF987654321"  # replace with a real unit_id

# 2. Find similar verified text units across all sources
similar = client.search(similar_to=source_unit_id, limit=15)

# 3. Group by source to see consensus breadth
sources = {}
for hit in similar["data"]:
    domain = hit["citation"]["registrable_domain"]
    if domain not in sources:
        sources[domain] = []
    sources[domain].append({
        "text": hit["unit"]["text"],
        "score": hit["relevance_score"],
        "signal": hit["citation"]["signal_title"],
    })

print(f"Similar claims found across {len(sources)} distinct sources:\n")
for domain, claims in sorted(sources.items(), key=lambda x: -len(x[1])):
    print(f"  {domain} ({len(claims)} matching units):")
    for c in claims[:2]:
        print(f"    [{c['score']:.3f}] {c['text'][:100]}")
    print()

Starting from a text query instead of a unit ID

If you don’t have a specific unit_id, search by text first, then use similar_to on the best match:
# Find the claim you're interested in
results = client.search("NVIDIA H200 shipments increased significantly", limit=1)
unit_id = results["data"][0]["unit"]["unit_id"]

# Now find cross-source consensus for that claim
consensus = client.search(similar_to=unit_id, limit=15)

Filtering consensus by recency

Use recency_boost to favor recent sources (0 = no boost, 1 = max boost):
# Find recent consensus — weight toward newer signals
recent_consensus = client.search(
    similar_to=unit_id,
    recency_boost=0.8,
    limit=10,
)

Filtering by theme or entity

Scope the consensus search to a specific context:
# Only find consensus within Infrastructure-tagged signals
consensus = client.search(similar_to=unit_id, theme="Infrastructure", limit=10)

# Only find consensus in signals mentioning NVIDIA
consensus = client.search(similar_to=unit_id, entity="NVIDIA", limit=10)

Step 2: Synthesize into a consensus assessment

import json

# Build the assessment input
original_claim = "NVIDIA H200 shipments increased significantly in Q1 2026"

corroborating_claims = []
for domain, domain_claims in sources.items():
    for c in domain_claims:
        corroborating_claims.append({
            "text": c["text"],
            "source": domain,
            "signal_title": c["signal"],
            "similarity_score": c["score"],
        })

assessment_data = json.dumps({
    "original_claim": original_claim,
    "corroborating_claims": corroborating_claims,
    "distinct_source_count": len(sources),
    "total_matching_units": sum(len(c) for c in sources.values()),
}, indent=2)

SYSTEM_PROMPT = """You are a research analyst producing a consensus assessment for
a specific claim. You will receive the original claim and a set of semantically
similar verified text units from independent expert sources, found via Gildea's
AI market intelligence platform.

Rules:
- Your job is to assess CONSENSUS BREADTH, not truth. You're measuring how many
  independent experts say something similar, not whether it's objectively true.
- Classify consensus as:
  - STRONG CONSENSUS (4+ distinct sources with high similarity)
  - MODERATE CONSENSUS (2-3 distinct sources)
  - WEAK/ISOLATED (0-1 distinct sources)
- For each corroborating source, note whether it AGREES (says essentially the same
  thing), QUALIFIES (adds nuance or caveats), or PARTIALLY DISAGREES (makes a
  related but different claim).
- Flag any contradictions explicitly — if one source says the opposite, that's
  critical information.
- End with a CONFIDENCE recommendation: how much weight should a decision-maker
  put on this claim?
- Keep it under 300 words.

Output format (markdown):

## Consensus Assessment

**Claim:** "<the original claim>"

**Consensus:** <Strong | Moderate | Weak> ([N] sources, [M] matching units)

**Confidence recommendation:** <Act on it | Investigate further | Treat with skepticism>

### Corroborating Sources
<For each source: domain, what they say, and whether they Agree/Qualify/Partially Disagree>

### Contradicting Sources
<Any sources that say the opposite, or "None found">

### Caveats
<1-2 sentences on what's missing: time periods, specificity, potential bias in source mix>

### Bottom Line
<1 sentence: should the reader trust this claim enough to act on it?>
"""

USER_PROMPT = f"""Assess the consensus for this claim:

{assessment_data}
"""

# Pass to your LLM, or print for manual use
print("=== SYSTEM PROMPT ===")
print(SYSTEM_PROMPT)
print("=== USER PROMPT ===")
print(USER_PROMPT)

Example output artifact

## Consensus Assessment

**Claim:** "NVIDIA H200 shipments increased significantly in Q1 2026"

**Consensus:** Strong (5 sources, 8 matching units)

**Confidence recommendation:** Act on it

### Corroborating Sources
- **semianalysis.com** — AGREES: "H200 shipments to hyperscalers exceeded internal
  targets by 15-20% in Q1" (similarity: 0.87)
- **theinformation.com** — AGREES: "NVIDIA delivered more H200 units in Q1 2026
  than all of H2 2025 combined" (similarity: 0.82)
- **bloomberg.com** — QUALIFIES: "H200 shipments grew, but concentrated in 3
  hyperscaler customers accounting for 70% of volume" (similarity: 0.78)
- **reuters.com** — AGREES: "NVIDIA's data center revenue growth driven by
  accelerating H200 adoption" (similarity: 0.74)
- **ft.com** — QUALIFIES: "While H200 volumes rose, average selling prices
  declined 8%, partially offsetting revenue gains" (similarity: 0.71)

### Contradicting Sources
None found.

### Caveats
No source provides exact unit counts — "significantly" is directionally supported
but not precisely quantified. Sources are predominantly US/Western-focused; China
shipment data is absent due to export controls. Bloomberg's concentration note
is important context: growth may be less broad-based than the headline implies.

### Bottom Line
This claim has strong multi-source backing. Act on the directional conclusion
(shipments up materially) but qualify any specific quantity claims with the
Bloomberg concentration caveat.

Interpreting results

SignalWhat it meansAction
5+ sources, high similarity (>0.8)Near-identical claims from independent analysts. Strong consensus.Treat as reliable. Cite the breadth in your own analysis.
2-3 sources, moderate similarity (0.6-0.8)Related ideas, not exact repetition. Directionally supported.Reliable enough for strategic reasoning. Note the qualifications.
1 source onlyIsolated claim. Could be a scoop or could be wrong.Don’t base a decision on this alone. Investigate primary sources.
0 matching unitsNo expert is making this claim.Either novel insight or unsupported assertion. Requires primary research.
High similarity but CONTRADICTING contentExperts are specifically debating this topic.The disagreement itself is valuable intelligence. Document both sides.
All sources from same domain typeConsensus may reflect echo chamber, not independent validation.Weight lower. Look for sources from different domain types (news vs. analysis vs. research).

API calls

  • 1 search call (or 2 if starting from text query)
  • Total: 1-2 calls per consensus check