Skip to main content
Gildea’s /v1/search endpoint runs a modern hybrid retrieval pipeline over every verified text unit in the corpus. Each query passes through three stages — dense retrieval, sparse retrieval, and cross-encoder reranking — combined via rank fusion.

Architecture overview

1

Query expansion to two vector spaces

Your query text is encoded in parallel into two different vector representations: a dense embedding (semantic meaning) and a sparse embedding (lexical match with learned expansion).
2

Parallel retrieval

Each embedding retrieves its top 50 candidates from the full corpus of verified text units, with any requested filters (entity, theme, date) pre-applied.
3

RRF fusion

The two ranked lists are merged using Reciprocal Rank Fusion — a rank-based blend that doesn’t require score normalization.
4

Cross-encoder rerank

The top candidates are passed through a cross-encoder reranker for final precision. The cross-encoder scores every (query, unit) pair jointly, catching nuances that bi-encoder retrieval misses (negation, hedging, subtle topical mismatches).
5

Post-processing

Optional recency boost, article diversification (max units per source article), and truncation to the requested limit.

Stage 1: Dense retrieval (semantic)

Captures semantic similarity — paraphrases, conceptual matches, related ideas.
PropertyValue
ModelBAAI/bge-base-en-v1.5
Dimensions768
Indexpgvector HNSW, cosine similarity
CutoffMinimum cosine similarity 0.3
The same model is used for stored unit embeddings and live query embeddings — both vectors live in the same space.

Stage 2: Sparse retrieval (learned lexical)

Captures exact term matching plus learned term expansion. SPLADE is a transformer-based sparse model: rather than a fixed TF-IDF vocabulary, it learns which vocabulary terms are relevant to a given piece of text, outputting weights over ~30k dimensions.
PropertyValue
ModelSPLADE++ (prithivida/Splade_PP_en_v1)
Dimensions~30,000 (BERT vocab), sparse
Indexpgvector HNSW, inner-product similarity
SPLADE outperforms classical BM25 on modern retrieval benchmarks by learning which terms matter for a given query, including related terms the query didn’t literally contain.

Stage 3: RRF fusion

Dense and sparse paths each produce a ranked list. Reciprocal Rank Fusion combines them without requiring the two scoring systems to be commensurate:
score(unit) = 1/(k + rank_dense) + 1/(k + rank_sparse)
This is robust across queries: some queries benefit more from lexical match (exact entity names, tickers), others from semantic similarity (abstract questions). RRF weights both without tuning.

Stage 4: Cross-encoder rerank

Bi-encoder retrieval (dense + sparse) optimizes for recall at low cost. A cross-encoder reranks top candidates for precision by scoring every (query, unit) pair jointly through a transformer — slower but substantially more accurate at the top of the list. Gildea uses Cohere Rerank (rerank-english-v3.0) on the top 150 candidates after RRF fusion. Cross-encoders are particularly valuable for Gildea’s corpus because verified claims are short and fact-dense, where a single token flip (“X did not beat Q3” vs “X beat Q3”) matters — exactly where bi-encoders struggle.
If the rerank service is unavailable, search gracefully falls back to RRF-fused dense + sparse ordering. Results are always returned; the reranker only improves precision when available.

Response diagnostics

Every search result includes a diagnostics object with retrieval stage ranks, useful for understanding why a result ranked where it did:
"diagnostics": {
  "rrf_rank": 20,       // Rank after dense + sparse RRF fusion
  "rerank_rank": 1,     // Final rank after cross-encoder rerank
  "rerank_score": 0.87  // Cross-encoder relevance score (0-1, higher is more relevant)
}
A large gap between rrf_rank and rerank_rank means the cross-encoder substantially changed the ordering — typically a signal that the bi-encoder missed a nuance that the cross-encoder caught.

Filters are pre-applied, not post-filtered

When you pass entity, theme, published_after, or published_before, Gildea resolves the matching article set first and scopes both retrieval paths to that set. No recall is wasted on candidates that would be filtered out later.

What the pipeline is optimized for

  • Short, fact-dense retrieval units. Every text unit is sentence-level (~40 tokens). Queries return compact, citable atomic facts — not raw document chunks.
  • Precision at the top. Cross-encoder rerank is specifically designed to concentrate the most relevant results at the top of the list.
  • No hallucination surface. Every returned unit has passed Gildea’s verification pipeline. LLM consumers can treat unit text as trusted content without re-verifying against source material.