Embed signals

Embed verified signals into the same vector space as your own data, so your agent can retrieve across both at once. Gildea serves the verified units; you embed the text with your own model and store the vectors next to your internal notes, docs, and past analyses. Now a single query reaches your private context and the verified market record together, with no second retrieval system to reconcile and no lock-in.

The recipe

You pull the verified units for your territory, embed their text with your own embedder, and persist the vectors next to their provenance. Use the same model you embed your own documents with, so every vector is comparable and one nearest-neighbor search spans both.

Pull the units you track

Take the verified units for your territory (from Search or your context store), keeping each unit’s id and citation as provenance.

Embed with your own model

Embed the unit text with your embedder (Cohere, OpenAI, Voyage, your choice). Use the same model everywhere, so every vector is comparable and search across your own data and Gildea’s is meaningful.

Persist the vectors

Store each vector next to its id and citation, so you embed once and reuse across runs. This is the index you query alongside your own documents.

Install gildea, cohere, and numpy, set GILDEA_API_KEY and COHERE_API_KEY, then:

import os
import json
import numpy as np
import cohere
from gildea import Gildea

co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])   # your embedder
gildea = Gildea()                                            # reads GILDEA_API_KEY

# 1. Pull the verified units for your territory; keep id + citation as provenance.
hits = gildea.search("NVIDIA data center moat durability competition", limit=30)["results"]
units = [{"id": h["unit"]["id"], "text": h["unit"]["text"], "citation": h["citation"]}
         for h in hits]

# 2. Embed the unit text with your own model -- the SAME embedder you use for your
#    own documents, so every vector is comparable in one shared space.
emb = co.embed(texts=[u["text"] for u in units], model="embed-english-v3.0",
               input_type="search_document", embedding_types=["float"])
vectors = np.array(emb.embeddings.float_)

# 3. Persist vectors next to their metadata, so you embed once and reuse across runs.
with open("embeddings.jsonl", "w") as f:
    for u, v in zip(units, vectors):
        f.write(json.dumps({"id": u["id"], "citation": u["citation"], "vector": v.tolist()}) + "\n")

# Sanity check: nearest neighbors in your own store -- the similarity a query
# across your data and Gildea's would rank on.
def near(i, k=3):
    cos = vectors @ vectors[i] / (np.linalg.norm(vectors, axis=1) * np.linalg.norm(vectors[i]))
    return [j for j in cos.argsort()[::-1] if j != i][:k]

print(f"embedded {len(units)} units -> {vectors.shape[1]}-dim, persisted to embeddings.jsonl")
print("nearest to:", units[0]["text"][:60])
for j in near(0):
    print("  ~", units[j]["text"][:60])

What you get

A persisted vector for every verified unit, each carrying its id and citation, sitting in the same space as your own documents. One similarity search now spans your private context and Gildea’s verified market record. You embed once with your own model and hold the vectors, so there’s no lock-in and no second retrieval system to reconcile. Keep the set current with Update.

The recipe

What you get

Search for signals

Update context

​The recipe

​What you get

Search for signals

Update context

The recipe

What you get