Skip to main content
Every verified text unit in Gildea — every thesis sentence, argument sentence, summary sentence, and claim — has a pre-computed embedding: a 768-dimensional vector that captures the meaning of the text. Embeddings power Gildea’s semantic search internally and are exposed so client applications can build their own retrieval and similarity workflows on top of Gildea’s verified corpus.

The model

Gildea uses BAAI/bge-base-en-v1.5, a widely-used English-optimized embedding model from the Beijing Academy of AI:
PropertyValue
ModelBAAI/bge-base-en-v1.5
Dimensions768
OutputNormalized (unit-length) vectors — use dot product for cosine similarity
LanguageEnglish
Embeddings are generated when a text unit is promoted to production and stored alongside the text. All embeddings served by the API come from the same model, so vectors returned by different endpoints (/v1/signals/{id}?include=embeddings and /v1/embed) live in the same space and can be compared directly.

Two endpoints, one vector space

Per-unit embeddings

Add ?include=embeddings to a signal detail request to get embeddings for every text unit in the decomposition, alongside the text itself.

Embed your text

POST arbitrary text (up to 2000 characters) to /v1/embed to get a vector in the same space. Use for embedding user content against Gildea’s corpus.

Primary use case: local similarity

The pairing of these two endpoints enables client applications to compute semantic similarity between user content and Gildea’s verified extractions, entirely client-side:
  1. Ingest Gildea signals with embeddings into local storage. Each text unit comes with its vector.
  2. Embed user content (notes, queries, drafts) via /v1/embed as it’s created.
  3. Compute cosine similarity locally between user vectors and stored Gildea vectors to find related claims, sentences, or theses.
This pattern is what powers the Gildea Obsidian plugin: user notes are semantically linked to verified Gildea claims without a round-trip to the API at query time.

Model stability

The embedding_model field is returned with every embedding so clients can detect changes. If Gildea upgrades the embedding model, the field will reflect the new identifier, and clients should re-embed any user content they’ve stored locally. Existing Gildea text unit embeddings will also be re-computed on the next pipeline run.

Notes

  • Rate limits apply. Both endpoints share your tier’s monthly request budget. See Rate Limits.
  • Vectors are normalized. Each vector has unit length, so cosine similarity equals the dot product.
  • Input length cap. POST /v1/embed accepts up to 2000 characters per request. Truncate longer content client-side.
  • No aggregate embeddings. Gildea stores one embedding per text unit, not per signal. To get a signal-level representation, aggregate unit embeddings client-side (mean or max-pool) if needed.