Embeddings
codescout uses embeddings for semantic search — finding code by meaning rather than exact text matches. This guide covers how to configure the embedding backend.
⚠ This page describes the pre-v0.12 single-service embedding model and is being phased out. As of v0.12 the default substrate is the Retrieval Stack (Qdrant + dense embedder + sparse SPLADE + cross-encoder reranker, configured via
CODESCOUT_*environment variables, not[embeddings]inproject.toml). The[embeddings]config block still loads but only themodel = "local:..."path is honoured — and only when the binary was built with thelocal-embedCargo feature.If you are setting up a fresh install: read Retrieval Stack instead. It covers the docker-compose stack, Ollama / llama.cpp / OpenAI integration, and the benchmark we used to pick defaults.
If you are upgrading from <v0.12: the
model/url/api_keyfields inproject.tomlno longer drive search. Runcodescout migrate-memoriesto move legacy memory data into Qdrant, then bring up the stack.The remainder of this page is kept as a reference for the legacy code path; treat it as historical.
Quick Start
codescout works out of the box with a bundled embedding model. No setup needed.
On first index(action: build), it downloads all-MiniLM-L6-v2 (~22 MB, quantized)
to ~/.cache/huggingface/hub/ and runs it locally via ONNX. This is a one-time download.
# .codescout/project.toml (default — no changes needed)
[embeddings]
model = "local:AllMiniLML6V2Q"
This is fine for single-project use or getting started. For better performance with multiple projects, see the next section.
Recommended: External Embedding Server
The bundled model loads into memory per codescout instance. With multiple projects open, this duplicates memory (~22 MB each for the default model). A dedicated embedding server avoids this:
- One process serves all codescout instances
- No memory duplication — the model loads once
- Faster queries — the model stays warm
- Model freedom — use any model and quantization
Configuration
Point codescout at your server with two fields:
[embeddings]
model = "nomic-embed-text-v1.5" # model name (sent in API request)
url = "http://127.0.0.1:43300/v1" # your server's base URL
# api_key = "optional-key" # or set EMBED_API_KEY env var
The url field works with any server implementing the OpenAI /v1/embeddings API.
codescout normalizes the URL automatically — all of these are equivalent:
http://127.0.0.1:43300http://127.0.0.1:43300/v1http://127.0.0.1:43300/v1/embeddings
Setup Examples
llama.cpp
Download a GGUF model and start the server:
# Download (example: nomic-embed-text quantized)
wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q8_0.gguf
# Start server
llama-server -m nomic-embed-text-v1.5.Q8_0.gguf --embeddings --port 43300
[embeddings]
model = "nomic-embed-text-v1.5"
url = "http://127.0.0.1:43300/v1"
Ollama
ollama pull nomic-embed-text
ollama serve # if not already running
[embeddings]
model = "nomic-embed-text"
url = "http://127.0.0.1:11434/v1"
vLLM
vllm serve nomic-ai/nomic-embed-text-v1.5 --task embed --port 43300
[embeddings]
model = "nomic-embed-text-v1.5"
url = "http://127.0.0.1:43300/v1"
TEI (HuggingFace Text Embeddings Inference)
docker run -p 43300:80 ghcr.io/huggingface/text-embeddings-inference \
--model-id nomic-ai/nomic-embed-text-v1.5
[embeddings]
model = "nomic-embed-text-v1.5"
url = "http://127.0.0.1:43300/v1"
OpenAI
[embeddings]
model = "text-embedding-3-small"
url = "https://api.openai.com/v1"
api_key = "sk-..." # or set EMBED_API_KEY env var
Configuration Reference
[embeddings] fields
| Field | Type | Default | Description |
|---|---|---|---|
model | string | "local:AllMiniLML6V2Q" | Model name. With url: sent in API body. Without url: prefix determines backend. |
url | string | (none) | Base URL for any OpenAI-compatible /v1/embeddings endpoint. |
api_key | string | (none) | API key sent as Bearer token. Also available via EMBED_API_KEY env var. |
drift_detection_enabled | bool | true | Track how much code meaning changes between index builds. |
Resolution Order
When codescout needs to embed text, it resolves the backend in this order:
urlis set → use it as an OpenAI-compatible endpointmodelstarts withlocal:→ bundled ONNX model via fastembedmodelstarts withollama:→ Ollama API (deprecated — useurlinstead)modelstarts withopenai:→ OpenAI API withOPENAI_API_KEY- No
url, no prefix → try as a local model name, then error with suggestions
Environment Variables
| Variable | Description |
|---|---|
EMBED_API_KEY | API key for the embedding endpoint (alternative to config field) |
OPENAI_API_KEY | OpenAI API key (used with openai: prefix) |
OLLAMA_HOST | Ollama daemon URL (deprecated — use url field) |
Model Recommendations
Minimum recommended: 768 dimensions for good code search quality.
| Model | Dims | Download | Context | Best For |
|---|---|---|---|---|
| nomic-embed-text-v1.5 | 768 | ~158 MB (Q) / ~547 MB | 8192 | General purpose, good quality |
| jina-embeddings-v2-base-en | 768 | ~300 MB | 8192 | Code-specialized |
| bge-m3 | 1024 | ~1.2 GB | 8192 | Best quality, needs external server |
| CodeSage-small-v2 | 1024 | ~500 MB | — | Purpose-built for code retrieval |
| text-embedding-3-small | 1536 | API only | 8191 | OpenAI hosted, no self-hosting |
Bundled Local Models
These work with the local: prefix (no server needed):
| Model ID | Dims | Size | Context | Notes |
|---|---|---|---|---|
NomicEmbedTextV15Q | 768 | ~158 MB | 8192 | General purpose, good quality |
NomicEmbedTextV15 | 768 | ~547 MB | 8192 | Full precision variant |
JinaEmbeddingsV2BaseCode | 768 | ~300 MB | 8192 | Code-specialized |
AllMiniLML6V2Q | 384 | ~22 MB | 256 | Default — bundled, zero-config |
AllMiniLML6V2 | 384 | ~90 MB | 256 | Full precision lightweight |
How It Works
-
AST-aware chunking — tree-sitter extracts top-level definitions (functions, classes, structs). Each chunk is a complete semantic unit, not an arbitrary text window.
-
Chunk size auto-derived — codescout calculates chunk size from the model’s context window. No manual tuning needed.
-
Vector storage — embeddings are upserted into Qdrant’s
code_chunkscollection over gRPC (defaultlocalhost:6334). Both a dense and a sparse vector are stored per chunk; query-time hybrid search fuses them via RRF inside Qdrant. See Hybrid Dense + Sparse Retrieval for the topology. -
Bundled model lifecycle — when using the
local:prefix (compile-timelocal-embedfeature), the ONNX model is loaded lazily on firstsemantic_searchorindex(action="build"), cached for 5 minutes, then unloaded to free memory. The default substrate is the HTTP dense embedder service, not the bundled ONNX path.
Choosing a Model
Not sure which model to use? See the Embedding Model Comparison for benchmark results across three models, real-world usage data, and recommendations.
TL;DR: The default (local:AllMiniLML6V2Q) is within 2 points of the best model on a
60-point benchmark, indexes 21x faster, and requires zero setup. Keep it unless you have
a specific reason to change.
Troubleshooting
Model mismatch after changing config
If you change the model or url after indexing, the stored vectors are incompatible.
Rebuild the index:
index(action: build, force: true)
Endpoint unreachable
Check that the server is running and the URL is correct:
curl http://127.0.0.1:43300/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"nomic-embed-text","input":["test"]}'
Corporate proxy blocking downloads
The bundled model downloads from HuggingFace. If your proxy blocks this:
- Download the model on an unrestricted machine
- Copy to
~/.cache/huggingface/hub/models--nomic-ai--nomic-embed-text-v1.5/ - Or use an external server instead (set
url)
Migration from Prefix Syntax
The ollama: prefix is deprecated and will be removed in a future version.
Migrate to the url field:
# Before (deprecated)
[embeddings]
model = "ollama:nomic-embed-text"
# After
[embeddings]
model = "nomic-embed-text"
url = "http://localhost:11434/v1"
The custom: prefix has been removed. Migrate to the url field:
# Before (removed)
[embeddings]
model = "custom:my-model@http://my-server:8080"
# After
[embeddings]
model = "my-model"
url = "http://my-server:8080/v1"