Search - Gildea API

GET /v1/search retrieves the most relevant verified text units for a query. It is hybrid: it matches on meaning (paraphrases and related concepts) and on keywords (exact terms and close variants), then ranks the results for relevance. One query in, one ranked, source-attributed list out. Because every returned unit has already passed verification, the results are trusted content: an LLM can reason over unit.text without re-checking it against the source.

How filters scope results

theme, content_type, and the date filters (published_after / published_before) are applied before ranking, so no relevance is wasted on candidates that would be filtered out later. The entity filter is precise: it returns only units whose text literally names the entity, matching its known surface forms. A unit about NVIDIA inside an article that also mentions Anthropic won’t surface under ?entity=<NVIDIA-id>. Coreference is not resolved: a unit that refers to an entity only by pronoun or description (“the company shipped a 200K-context model”) won’t match the entity filter, even when the referent is obvious from context.

What it’s optimized for

Short, fact-dense units. Every unit is sentence-level, so you get compact, citable atomic facts, not raw document chunks.
Precision at the top. Results are ordered to surface the most relevant units first.
No hallucination surface. Every returned unit has passed verification, so consumers can treat unit.text as trusted without re-checking the source.
Predictable depth. Raising limit (up to 100) genuinely widens the candidate pool rather than truncating to the same small set.

When to use which mode

Goal	Approach
Find a specific factual answer	`/v1/search?q=...` with `limit=10`.
Strategic synthesis (broad coverage)	`/v1/search?q=...&limit=50`; raise `diversity_cap` if one in-depth article should contribute more units.
Find units similar to a known unit	`/v1/search?similar_to=<unit_id>` for cross-source corroboration.
Time-scoped retrieval	`published_after` / `published_before` for precise windows; `recency_boost` for a soft preference toward newer content.

Reading a result

Each result is a verified unit plus its citation. Three fields do three different jobs:

unit.text is the verified claim. Cite it and reason over it directly; it has already passed verification, so there is no need to re-check it against the source.
evidence.snippets are a short (~100-character) locator into the source, not the full passage. They let an agent confirm a unit is grounded; the full passage lives at citation.url.
citation.url is the audit path: follow it to the original source to re-verify.

Results are ranked by relevance_score (higher = more relevant), with no fixed floor: the list always fills to limit, so the lower-ranked tail of a narrow query can taper into weaker matches. Let the score, not limit, decide how many results to use. See GET /v1/search for the full parameter list and response shape. /v1/search is a retrieval primitive: one query in, one ranked list out. Richer workflows come from composing several calls, most naturally orchestrated by an agent. The Recipes walk through those patterns.

​How filters scope results

​What it’s optimized for

​When to use which mode

​Reading a result

How filters scope results

What it’s optimized for

When to use which mode

Reading a result