Skip to main content
GET /v1/search retrieves the most relevant verified text units for a query. It is hybrid: it matches on meaning (paraphrases and related concepts) and on keywords (exact terms and close variants), then ranks the results for relevance. One query in, one ranked, source-attributed list out. Because every returned unit has already passed verification, the results are trusted content: an LLM can reason over unit.text without re-checking it against the source.

How filters scope results

theme, content_type, and the date filters (published_after / published_before) are applied before ranking, so no relevance is wasted on candidates that would be filtered out later. The entity filter is precise: it returns only units whose text literally names the entity, matching its known surface forms. A unit about NVIDIA inside an article that also mentions Anthropic won’t surface under ?entity=<NVIDIA-id>. Coreference is not resolved: a unit that refers to an entity only by pronoun or description (“the company shipped a 200K-context model”) won’t match the entity filter, even when the referent is obvious from context.

What it’s optimized for

  • Short, fact-dense units. Every unit is sentence-level, so you get compact, citable atomic facts, not raw document chunks.
  • Precision at the top. Results are ordered to surface the most relevant units first.
  • No hallucination surface. Every returned unit has passed verification, so consumers can treat unit.text as trusted without re-checking the source.
  • Predictable depth. Raising limit (up to 100) genuinely widens the candidate pool rather than truncating to the same small set.

When to use which mode

GoalApproach
Find a specific factual answer/v1/search?q=... with limit=10.
Strategic synthesis (broad coverage)/v1/search?q=...&limit=50; raise diversity_cap if one in-depth article should contribute more units.
Find units similar to a known unit/v1/search?similar_to=<unit_id> for cross-source corroboration.
Time-scoped retrievalpublished_after / published_before for precise windows; recency_boost for a soft preference toward newer content.

Reading a result

Each result is a verified unit plus its citation. Three fields do three different jobs:
  • unit.text is the verified claim. Cite it and reason over it directly; it has already passed verification, so there is no need to re-check it against the source.
  • evidence.snippets are a short (~100-character) locator into the source, not the full passage. They let an agent confirm a unit is grounded; the full passage lives at citation.url.
  • citation.url is the audit path: follow it to the original source to re-verify.
Results are ranked by relevance_score (higher = more relevant), with no fixed floor: the list always fills to limit, so the lower-ranked tail of a narrow query can taper into weaker matches. Let the score, not limit, decide how many results to use. See GET /v1/search for the full parameter list and response shape. /v1/search is a retrieval primitive: one query in, one ranked list out. Richer workflows come from composing several calls, most naturally orchestrated by an agent. The Recipes walk through those patterns.