> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gildea.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Embed signals

> Vectorize verified units into the same space as your own data, queryable by one similarity metric.

**Embed** verified signals into the same vector space as your own data, so your agent can retrieve across both at once. Gildea serves the verified units; you embed the text with your own model and store the vectors next to your internal notes, docs, and past analyses. Now a single query reaches your private context and the verified market record together, with no second retrieval system to reconcile and no lock-in.

## The recipe

You pull the verified units for your territory, embed their text with your own embedder, and persist the vectors next to their provenance. Use the same model you embed your own documents with, so every vector is comparable and one nearest-neighbor search spans both.

<Steps>
  <Step title="Pull the units you track">
    Take the verified units for your territory (from [Search](/guides/search) or your context store), keeping each unit's `id` and `citation` as provenance.
  </Step>

  <Step title="Embed with your own model">
    Embed the unit text with your embedder (Cohere, OpenAI, Voyage, your choice). Use the *same* model everywhere, so every vector is comparable and search across your own data and Gildea's is meaningful.
  </Step>

  <Step title="Persist the vectors">
    Store each vector next to its `id` and `citation`, so you embed once and reuse across runs. This is the index you query alongside your own documents.
  </Step>
</Steps>

Install `gildea`, `cohere`, and `numpy`, set `GILDEA_API_KEY` and `COHERE_API_KEY`, then:

```python theme={null}
import os
import json
import numpy as np
import cohere
from gildea import Gildea

co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])   # your embedder
gildea = Gildea()                                            # reads GILDEA_API_KEY

# 1. Pull the verified units for your territory; keep id + citation as provenance.
hits = gildea.search("NVIDIA data center moat durability competition", limit=30)["results"]
units = [{"id": h["unit"]["id"], "text": h["unit"]["text"], "citation": h["citation"]}
         for h in hits]

# 2. Embed the unit text with your own model -- the SAME embedder you use for your
#    own documents, so every vector is comparable in one shared space.
emb = co.embed(texts=[u["text"] for u in units], model="embed-english-v3.0",
               input_type="search_document", embedding_types=["float"])
vectors = np.array(emb.embeddings.float_)

# 3. Persist vectors next to their metadata, so you embed once and reuse across runs.
with open("embeddings.jsonl", "w") as f:
    for u, v in zip(units, vectors):
        f.write(json.dumps({"id": u["id"], "citation": u["citation"], "vector": v.tolist()}) + "\n")

# Sanity check: nearest neighbors in your own store -- the similarity a query
# across your data and Gildea's would rank on.
def near(i, k=3):
    cos = vectors @ vectors[i] / (np.linalg.norm(vectors, axis=1) * np.linalg.norm(vectors[i]))
    return [j for j in cos.argsort()[::-1] if j != i][:k]

print(f"embedded {len(units)} units -> {vectors.shape[1]}-dim, persisted to embeddings.jsonl")
print("nearest to:", units[0]["text"][:60])
for j in near(0):
    print("  ~", units[j]["text"][:60])
```

## What you get

A persisted vector for every verified unit, each carrying its `id` and `citation`, sitting in the same space as your own documents. One similarity search now spans your private context and Gildea's verified market record. You embed once with your own model and hold the vectors, so there's no lock-in and no second retrieval system to reconcile. Keep the set current with [Update](/guides/update).

<CardGroup cols={2}>
  <Card title="Search for signals" icon="magnifying-glass" href="/guides/search" />

  <Card title="Update context" icon="satellite-dish" href="/guides/update" />
</CardGroup>
