Embedding Providers
RepoRelay supports multiple embedding providers, selected via the EMBEDDING_PROVIDER environment variable.
| Provider | Value | Description |
|---|---|---|
| Ollama (default) | ollama | Local embedding via Ollama |
| OpenAI-compatible | openai | Any OpenAI-compatible API (OpenAI, Azure OpenAI, Together AI, Mistral, etc.) |
Ollama
Ollama runs natively on macOS with Metal GPU acceleration — no Docker needed for the embedding model.
brew install ollama
ollama serve
ollama pull nomic-embed-textAdd to .env:
EMBEDDING_PROVIDER=ollama # default, can be omitted
EMBEDDING_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_CONCURRENCY=4 # must match OLLAMA_NUM_PARALLELTuning parallelism
Indexing large repos is typically bottlenecked on embedding throughput. Two knobs have to agree:
EMBEDDING_CONCURRENCY(RepoRelay) — how many batches RepoRelay dispatches in parallel.OLLAMA_NUM_PARALLEL(Ollama server) — how many requests the Ollama server will process concurrently. Extra client requests just queue.
Setting EMBEDDING_CONCURRENCY higher than OLLAMA_NUM_PARALLEL gains nothing. Start with both at 4.
macOS (Ollama.app): env vars set in your shell are ignored — the app reads them from launchctl. Quit Ollama, then:
launchctl setenv OLLAMA_NUM_PARALLEL 4
launchctl setenv OLLAMA_MAX_LOADED_MODELS 1
# Relaunch Ollama from the menu barLinux / ollama serve directly: OLLAMA_NUM_PARALLEL=4 ollama serve.
If the embedding model is running on CPU or a single GPU that's already saturated by one request, parallelism won't help — requests will accept concurrently but serialize internally. In that case lower EMBEDDING_CONCURRENCY back to 1.
OpenAI-compatible
Works with any provider that implements the OpenAI POST /v1/embeddings endpoint format — OpenAI, Azure OpenAI, Together AI, Mistral, etc.
Add to .env:
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...Optional settings
# Custom base URL for OpenAI-compatible providers (default: https://api.openai.com/v1)
EMBEDDING_URL=https://my-proxy.example.com/v1
# Request a specific number of dimensions from the API.
# Only supported by text-embedding-3 and later models.
# Must produce vectors matching DB_EMBEDDING_DIMENSIONS (768) or init() will report a mismatch.
EMBEDDING_DIMENSIONS=768Model compatibility
The database schema stores embeddings as 768-dimensional vectors. At startup, RepoRelay probes the configured model and verifies the returned vector width matches. If there is a dimension mismatch, the embedder logs a warning and embedding features are disabled until the configuration is corrected.
For OpenAI models:
text-embedding-3-small(1536-d default) — setEMBEDDING_DIMENSIONS=768to reduce to 768-dtext-embedding-3-large(3072-d default) — setEMBEDDING_DIMENSIONS=768to reduce to 768-dtext-embedding-ada-002(1536-d fixed) — does not support thedimensionsparameter; cannot be used with the default 768-d schema