Skip to main content
You need to get smart on a topic fast — for a client meeting, an investment memo, or a product decision. Reading 30 articles isn’t an option. This recipe uses semantic search to find the most relevant verified expert reasoning on any AI topic and produces a structured research dossier with findings, source attribution, and identified knowledge gaps.

Who this is for

  • Consultants doing rapid knowledge acquisition before a client engagement
  • Investors building an informed view on a technology area or market dynamic
  • Product leaders researching adjacent markets or competitive moves

The pattern

  1. Search with a natural language query
  2. Scope results with filters (theme, entity, date range, unit type)
  3. Follow up with signal detail for full reasoning chains
  4. Synthesize into a structured research dossier via LLM

Step 1: Search across verified intelligence

from gildea_sdk import Gildea

client = Gildea()

# Semantic search across all verified text units
results = client.search("open source model licensing implications", limit=10)

for hit in results["data"]:
    print(f"[{hit['relevance_score']:.4f}] {hit['unit']['text']}")
    print(f"  Type: {hit['unit']['unit_type']}")
    print(f"  Source: {hit['citation']['signal_title']} ({hit['citation']['registrable_domain']})")
    print()

By theme

# Only search within Regulatory & Legal tagged signals
results = client.search(
    "AI model licensing requirements",
    theme="Regulatory & Legal",
    limit=10,
)

By entity

# Only search signals that mention a specific entity
results = client.search(
    "open source strategy",
    entity="Meta",
    limit=10,
)

By date range

# Only recent results (last 30 days)
results = client.search(
    "open source model licensing",
    published_after="2026-03-15",
    limit=10,
)

By text unit type

# Only claims (the most specific, verifiable assertions)
results = client.search(
    "open source model licensing",
    unit_type="analysis_claim",
    limit=10,
)

# Only thesis sentences (authors' central arguments)
results = client.search(
    "open source model licensing",
    unit_type="thesis_sentence",
    limit=10,
)

Combining filters

# Recent claims about Meta's open source strategy in Regulatory context
results = client.search(
    "open source licensing",
    entity="Meta",
    theme="Regulatory & Legal",
    unit_type="analysis_claim",
    published_after="2026-03-01",
    limit=10,
)

Step 3: Expand from a strong result

Found a claim that’s exactly what you’re researching? Use similar_to to find more like it:
# Get the unit_id of the best result
best_unit_id = results["data"][0]["unit"]["unit_id"]

# Find similar verified text units across all sources
expanded = client.search(similar_to=best_unit_id, limit=10)

print("Similar claims from other sources:")
for hit in expanded["data"]:
    print(f"  [{hit['relevance_score']:.3f}] {hit['unit']['text']}")
    print(f"    Source: {hit['citation']['registrable_domain']}")

Step 4: Get full signal context

When a search result looks important, pull the full decomposition to understand the complete reasoning chain:
# Get full signal with evidence
signal_id = results["data"][0]["citation"]["signal_id"]
signal = client.signals.get(signal_id, include="evidence")

print(f"Signal: {signal['title']}")
print(f"Thesis: {signal['decomposition']['thesis']['text']}")
print(f"\nArguments:")
for arg in signal["decomposition"].get("arguments", []):
    for s in arg["sentences"]:
        print(f"  - {s['unit']['text']}")
    for c in arg["claims"]:
        print(f"  * CLAIM: {c['unit']['text']}")

Step 5: Boost recent results for fast-moving topics

# recency_boost: 0 = no time weighting, 1 = maximum recency preference
results = client.search(
    "frontier model scaling laws",
    recency_boost=0.7,
    limit=10,
)

Step 6: Build the research dossier

Collect findings from multiple search angles and synthesize into a structured deliverable.
import json

# Run multiple targeted searches to build a comprehensive picture
topic = "open source AI model licensing"

# Angle 1: Expert theses (what are the big arguments?)
theses = client.search(topic, unit_type="thesis_sentence", limit=10)

# Angle 2: Specific claims (what are the verified facts?)
claims = client.search(topic, unit_type="analysis_claim", limit=10)

# Angle 3: Recent developments (what just happened?)
recent = client.search(topic, recency_boost=0.8, limit=10)

# Angle 4: Entity-scoped (what are key players doing?)
meta_view = client.search("open source licensing strategy", entity="Meta", limit=5)

# Compile all findings
research_data = {
    "topic": topic,
    "expert_theses": [
        {
            "text": h["unit"]["text"],
            "source": h["citation"]["registrable_domain"],
            "signal": h["citation"]["signal_title"],
        }
        for h in theses["data"]
    ],
    "verified_claims": [
        {
            "text": h["unit"]["text"],
            "source": h["citation"]["registrable_domain"],
            "signal": h["citation"]["signal_title"],
        }
        for h in claims["data"]
    ],
    "recent_developments": [
        {
            "text": h["unit"]["text"],
            "source": h["citation"]["registrable_domain"],
            "signal": h["citation"]["signal_title"],
            "date": h["citation"].get("published_at", "")[:10],
        }
        for h in recent["data"]
    ],
    "key_player_intelligence": [
        {
            "text": h["unit"]["text"],
            "source": h["citation"]["registrable_domain"],
            "entity_scope": "Meta",
        }
        for h in meta_view["data"]
    ],
    "total_sources": len(set(
        h["citation"]["registrable_domain"]
        for search_results in [theses, claims, recent, meta_view]
        for h in search_results["data"]
    )),
}

research_json = json.dumps(research_data, indent=2)

SYSTEM_PROMPT = """You are a senior research analyst producing a structured dossier
on an AI market topic. You will receive findings from multiple search angles across
Gildea's verified intelligence database: expert theses, verified claims, recent
developments, and entity-specific intelligence.

Rules:
- This is for someone who needs to get smart on a topic in 5 minutes — not an
  academic literature review.
- Structure the dossier around WHAT IS KNOWN (established facts), WHAT IS DEBATED
  (competing expert views), and WHAT IS UNKNOWN (gaps in coverage).
- Expert theses represent analytical arguments. Claims represent verified facts.
  Treat them differently — theses are opinions (attributed), claims are facts
  (corroborated).
- Always attribute: "(source: domain.com)" after each finding.
- Identify CONTRADICTIONS between sources explicitly. Disagreement among experts
  is one of the most valuable things you can surface.
- End with "Knowledge Gaps" — what questions remain unanswered by the available
  evidence? This tells the reader where to focus further research.
- Keep it under 600 words.

Output format (markdown):

## Research Dossier: [Topic]

**Sources reviewed:** [N] distinct expert sources
**Coverage period:** [date range of findings]

### Executive Summary
<3-4 sentences: what a decision-maker needs to know about this topic right now>

### What Is Known (Verified Facts)
<Bulleted list of verified claims with source attribution. Group by sub-topic
if more than 5 claims.>

### What Is Debated (Competing Expert Views)
<Summarize the 2-3 main analytical positions experts hold on this topic, with
attribution. Highlight contradictions.>

### Recent Developments
<Bulleted list of time-sensitive findings from the recency-boosted search>

### Key Player Activity
<What specific entities are doing, based on entity-scoped search results>

### Knowledge Gaps
<Bulleted list of 2-4 questions that the available evidence does NOT answer>

### Recommended Next Steps
<2-3 specific follow-up actions: additional searches, entities to monitor,
signals to read in full>
"""

USER_PROMPT = f"""Produce a research dossier from these multi-angle findings:

{research_json}
"""

# Pass SYSTEM_PROMPT and USER_PROMPT to your LLM of choice.
# Example with Anthropic SDK:
#
# import anthropic
# llm = anthropic.Anthropic()
# response = llm.messages.create(
#     model="claude-sonnet-4-20250514",
#     max_tokens=2048,
#     system=SYSTEM_PROMPT,
#     messages=[{"role": "user", "content": USER_PROMPT}],
# )
# dossier = response.content[0].text

print("=== SYSTEM PROMPT ===")
print(SYSTEM_PROMPT)
print("=== USER PROMPT ===")
print(USER_PROMPT)

Example output artifact

## Research Dossier: Open Source AI Model Licensing

**Sources reviewed:** 14 distinct expert sources
**Coverage period:** March 2026 - April 2026

### Executive Summary
Open source AI model licensing is at an inflection point. Meta's Llama license
has become the de facto standard but faces increasing scrutiny over its
"open source in name only" restrictions. The EU AI Act's transparency requirements
are creating new compliance obligations that may favor truly open models. Enterprise
adoption is accelerating, but legal teams are flagging unresolved liability questions
around fine-tuned derivative models.

### What Is Known (Verified Facts)
- Meta's Llama 3 license restricts commercial use for companies with 700M+ monthly
  active users (source: theinformation.com)
- 68% of enterprise AI deployments use at least one open-weight model component
  (source: a16z.com)
- EU AI Act Article 53 requires open-source foundation model providers to publish
  training data summaries by August 2026 (source: euractiv.com)
- Hugging Face hosts 400,000+ model variants, up from 250,000 a year ago
  (source: wired.com)

### What Is Debated (Competing Expert Views)
**Position 1: Open source wins.** Several analysts argue open models will
commoditize the model layer, shifting value to applications and data
(source: stratechery.com, a16z.com).

**Position 2: Open source is a distribution strategy, not altruism.** Counter-view
holds that Meta's open-source push is primarily a competitive weapon against
Google and OpenAI's closed models (source: semianalysis.com, bloomberg.com).

**Position 3: Regulation will fragment the landscape.** EU compliance requirements
may create a two-tier system: fully open models for research, restricted-license
models for commercial deployment (source: ft.com).

### Recent Developments
- Meta reportedly renegotiating enterprise Llama license terms with 3 major
  cloud providers (April 2026, source: theinformation.com)
- Linux Foundation launching AI model license standardization initiative
  (March 2026, source: techcrunch.com)

### Key Player Activity
**Meta:** Actively expanding Llama's enterprise footprint while tightening license
terms for largest users. Internal debate reportedly ongoing about whether to
restrict fine-tuning rights (source: theinformation.com).

### Knowledge Gaps
- How will derivative model liability be allocated between the base model
  provider and the fine-tuner?
- What is the actual compliance cost for open-source model providers under
  EU AI Act Article 53?
- Are enterprises choosing open models for cost, control, or regulatory reasons?
- How are Chinese open-source models (Qwen, DeepSeek) affecting the licensing
  landscape?

### Recommended Next Steps
- Pull the full signal decomposition on the Meta license renegotiation story
  for specific claim-level detail
- Monitor the "Meta" entity for licensing-related signals over the next 2 weeks
- Run a separate search on "AI model liability" to fill the derivative model
  knowledge gap

Interpreting results

SignalWhat it meansAction
10+ results with high relevance (>0.7)Well-covered topic. Plenty of expert analysis available.You can build a confident view. Focus on contradictions, not just volume.
< 5 resultsUndercovered topic. Few experts writing about this.Either niche (valuable if you’re early) or poorly phrased query. Try reformulating.
All results from thesis_sentence typeTopic is debated at the analytical level but lacks specific verified factsGood for understanding the arguments, but verify specific claims independently.
All results from analysis_claim typeRich factual base with specific verifiable assertionsStrong foundation for a data-driven argument.
Results clustered in 1-2 sourcesOne or two outlets dominating coverageBroaden your search. Use similar_to on the best result to find diverse sources.
Contradicting theses from different sourcesActive expert disagreementThe most valuable research finding. Document both sides for your stakeholder.

API calls

  • 1 call per search query
  • 1 call per similar_to expansion
  • 1 call per signal detail follow-up
  • Total for a research session: 5-15 calls