/v1/search endpoint runs a modern hybrid retrieval pipeline over every verified text unit in the corpus. Each query passes through three stages — dense retrieval, sparse retrieval, and cross-encoder reranking — combined via rank fusion.
Architecture overview
Query expansion to two vector spaces
Your query text is encoded in parallel into two different vector representations: a dense embedding (semantic meaning) and a sparse embedding (lexical match with learned expansion).
Parallel retrieval
Each embedding retrieves its top 50 candidates from the full corpus of verified text units, with any requested filters (entity, theme, date) pre-applied.
RRF fusion
The two ranked lists are merged using Reciprocal Rank Fusion — a rank-based blend that doesn’t require score normalization.
Cross-encoder rerank
The top candidates are passed through a cross-encoder reranker for final precision. The cross-encoder scores every (query, unit) pair jointly, catching nuances that bi-encoder retrieval misses (negation, hedging, subtle topical mismatches).
Stage 1: Dense retrieval (semantic)
Captures semantic similarity — paraphrases, conceptual matches, related ideas.| Property | Value |
|---|---|
| Model | BAAI/bge-base-en-v1.5 |
| Dimensions | 768 |
| Index | pgvector HNSW, cosine similarity |
| Cutoff | Minimum cosine similarity 0.3 |
Stage 2: Sparse retrieval (learned lexical)
Captures exact term matching plus learned term expansion. SPLADE is a transformer-based sparse model: rather than a fixed TF-IDF vocabulary, it learns which vocabulary terms are relevant to a given piece of text, outputting weights over ~30k dimensions.| Property | Value |
|---|---|
| Model | SPLADE++ (prithivida/Splade_PP_en_v1) |
| Dimensions | ~30,000 (BERT vocab), sparse |
| Index | pgvector HNSW, inner-product similarity |
Stage 3: RRF fusion
Dense and sparse paths each produce a ranked list. Reciprocal Rank Fusion combines them without requiring the two scoring systems to be commensurate:Stage 4: Cross-encoder rerank
Bi-encoder retrieval (dense + sparse) optimizes for recall at low cost. A cross-encoder reranks top candidates for precision by scoring every (query, unit) pair jointly through a transformer — slower but substantially more accurate at the top of the list. Gildea uses Cohere Rerank (rerank-english-v3.0) on the top 150 candidates after RRF fusion. Cross-encoders are particularly valuable for Gildea’s corpus because verified claims are short and fact-dense, where a single token flip (“X did not beat Q3” vs “X beat Q3”) matters — exactly where bi-encoders struggle.
If the rerank service is unavailable, search gracefully falls back to RRF-fused dense + sparse ordering. Results are always returned; the reranker only improves precision when available.
Response diagnostics
Every search result includes adiagnostics object with retrieval stage ranks, useful for understanding why a result ranked where it did:
rrf_rank and rerank_rank means the cross-encoder substantially changed the ordering — typically a signal that the bi-encoder missed a nuance that the cross-encoder caught.
Filters are pre-applied, not post-filtered
When you passentity, theme, published_after, or published_before, Gildea resolves the matching article set first and scopes both retrieval paths to that set. No recall is wasted on candidates that would be filtered out later.
What the pipeline is optimized for
- Short, fact-dense retrieval units. Every text unit is sentence-level (~40 tokens). Queries return compact, citable atomic facts — not raw document chunks.
- Precision at the top. Cross-encoder rerank is specifically designed to concentrate the most relevant results at the top of the list.
- No hallucination surface. Every returned unit has passed Gildea’s verification pipeline. LLM consumers can treat unit text as trusted content without re-verifying against source material.