Embedding Backends
Semantic search requires converting source code into vector embeddings. codescout supports
four backends, selected at runtime by the model field in [embeddings] inside
.codescout/project.toml. The prefix before the colon determines which backend is used.
[embeddings]
model = "ollama:nomic-embed-text" # recommended when Ollama is available
The onboarding tool detects your hardware at setup time and writes the best model for
your machine into this field automatically. You rarely need to set it manually — see
Choosing a Backend if you want to override it.
Backend Comparison
| Backend | Speed | Quality | Cost | Privacy | Setup |
|---|---|---|---|---|---|
| Local (fastembed, default) | Fast (CPU) | Good–Excellent | Free | Fully local | Bundled — no setup needed |
| Ollama | Medium | Good | Free | Local | Install Ollama + pull model |
| OpenAI | Fast (network) | Excellent | Pay-per-token | Data sent to OpenAI | Set OPENAI_API_KEY |
| Custom endpoint | Varies | Varies | Varies | Depends on host | Point at any compatible server |
Recommended Models
onboarding picks the best model for your machine automatically. This table is a reference
for manual overrides or when comparing options.
| Model string | Backend | Dims | Context | Code quality | Notes |
|---|---|---|---|---|---|
local:AllMiniLML6V2Q | fastembed | 384 | 256 tok | Good | Default. 22 MB, zero-config, bundled. |
local:JinaEmbeddingsV2BaseCode | fastembed | 768 | 8192 tok | Excellent | Recommended (CPU-only). Code-specific. |
ollama:nomic-embed-text | Ollama | 768 | 8192 tok | Good | Recommended if Ollama is already running. |
ollama:bge-m3 | Ollama | 1024 | 8192 tok | Excellent | Best Ollama quality; slower, ~1.2 GB. |
openai:text-embedding-3-small | OpenAI | 1536 | — | Excellent | Best quality/cost if cloud is acceptable. |
openai:text-embedding-3-large | OpenAI | 3072 | — | Best | Overkill for most codebases. |
ollama:mxbai-embed-large | Ollama | 1024 | 512 tok | Good | Legacy. Short context truncates most code. |
Switching models requires a full reindex — see Rebuilding After a Model Change below. Scores are not comparable across models; a score of 0.75 means different things with different models.
Ollama
Uses a locally running Ollama daemon. No API key is required.
Model string format: "ollama:<model-name>"
Endpoint: $OLLAMA_HOST/v1/embeddings (default: http://localhost:11434/v1/embeddings)
Setup
-
Install Ollama from ollama.com.
-
Pull the embedding model:
ollama pull mxbai-embed-large -
Make sure the daemon is running:
ollama serve
Configuration
[embeddings]
model = "ollama:mxbai-embed-large"
To use a different Ollama host (e.g. a remote machine or a custom port), set the
OLLAMA_HOST environment variable before starting the MCP server:
export OLLAMA_HOST=http://192.168.1.50:11434
Automatic CPU Fallback
If Ollama is not running when codescout tries to connect, you’ll see a clear error:
Ollama is not reachable at http://localhost:11434
Options:
- Start Ollama:
ollama serve - Switch to bundled ONNX: set
model = "local:AllMiniLML6V2Q"in[embeddings] - Use a different server: set
url = "http://your-server:port/v1"in[embeddings]
Recommended Ollama Models
| Model | Dimensions | Context | Notes |
|---|---|---|---|
nomic-embed-text | 768 | 8192 tok | Recommended default. Fast indexing, 137 MB. |
bge-m3 | 1024 | 8192 tok | Best retrieval quality; ~1.2 GB download. |
mxbai-embed-large | 1024 | 512 tok | Legacy; short context truncates most functions. |
OpenAI
Calls the OpenAI embeddings API. Requires an active OpenAI account and an API key.
Model string format: "openai:<model-name>"
Endpoint: https://api.openai.com/v1/embeddings
Authentication: $OPENAI_API_KEY environment variable (required)
Setup
Set your API key in the environment before starting the MCP server:
export OPENAI_API_KEY=sk-...
Configuration
[embeddings]
model = "openai:text-embedding-3-small"
Recommended OpenAI Models
| Model | Dimensions | Notes |
|---|---|---|
text-embedding-3-small | 1536 | Low cost, good quality, recommended |
text-embedding-3-large | 3072 | Highest quality, higher cost |
text-embedding-ada-002 | 1536 | Legacy model, still widely used |
Custom Endpoint
Points at any OpenAI-compatible embeddings API — useful for self-hosted models, Azure OpenAI, Together AI, or other third-party providers.
Model string format: "custom:<model-name>@<base-url>"
codescout appends /v1/embeddings to <base-url>, so a base URL of
http://localhost:1234 becomes http://localhost:1234/v1/embeddings.
Authentication: $EMBED_API_KEY environment variable (optional — set it if the server
requires a bearer token)
Setup
Start your compatible server, then set the API key if needed:
export EMBED_API_KEY=your-token-here
Configuration
[embeddings]
model = "custom:mxbai-embed-large@http://localhost:1234"
Examples for common providers:
# Azure OpenAI
model = "custom:text-embedding-3-small@https://my-resource.openai.azure.com/openai/deployments/my-deployment"
# Together AI
model = "custom:togethercomputer/m2-bert-80M-8k-retrieval@https://api.together.xyz"
# Hugging Face Text Embeddings Inference (TEI)
model = "custom:BAAI/bge-large-en-v1.5@http://localhost:8080"
Local (fastembed)
Runs entirely on-device using fastembed-rs and ONNX Runtime. No external daemon, no API key, and no network traffic after the initial model download.
Model string format: "local:<EmbeddingModel-variant>"
Requires: building from source with the local-embed Cargo feature. This backend is
not available in the published cargo install codescout binary because ONNX Runtime is
a native system library that cannot be bundled through crates.io.
Looking for free local embeddings without building from source? Use Ollama instead — it is the recommended path for most users.
Model cache: ~/.cache/huggingface/hub/ — downloaded on first use, then fully offline
Build
Clone the repository and build with local embedding support:
git clone https://github.com/mareurs/codescout.git
cd codescout
cargo install --path . --features local-embed
Or, to have both local and remote backends available simultaneously:
cargo install --path . --features remote-embed,local-embed
Configuration
[embeddings]
model = "local:AllMiniLML6V2Q"
Supported Local Models
| Model string | Dimensions | Download size | Notes |
|---|---|---|---|
local:JinaEmbeddingsV2BaseCode | 768 | ~300 MB | Code-specific, highest quality for source code |
local:AllMiniLML6V2Q | 384 | ~22 MB | INT8-quantized, CPU-safe, recommended for most users |
local:BGESmallENV15Q | 384 | ~20 MB | GPU-optimized export; may fail on CPU-only machines |
local:BGESmallENV15 | 384 | ~65 MB | Full f32 precision variant of BGESmallENV15Q |
local:AllMiniLML6V2 | 384 | ~90 MB | Full f32 precision variant of AllMiniLML6V2Q |
For most local setups, AllMiniLML6V2Q gives the best tradeoff: small download, CPU-safe
inference, and solid retrieval quality. Use JinaEmbeddingsV2BaseCode when search quality
on code is the priority and the larger download is acceptable.
Error When Feature Is Missing
If you try to use a local: model without the local-embed feature compiled in, you will
see an error like:
Local embedding requires the 'local-embed' feature.
Rebuild with: cargo build --features local-embed
Batching
All backends send texts in batches of 8. This avoids HTTP 400 errors from servers that have payload size limits (Ollama is particularly strict about this) and keeps per-request latency manageable. The batch size is fixed and not configurable.
Rebuilding After a Model Change
The embedding index stores which model was used to build it. If you change the model field
in project.toml, you must rebuild the index with the force flag to avoid mixing vectors
from different models:
{ "name": "index(action: build)", "arguments": { "force": true } }
codescout will warn if it detects a mismatch between the configured model and the model recorded in the existing index.
Choosing a Backend
In most cases you do not need to choose — onboarding probes your hardware and writes
the recommended model into .codescout/project.toml automatically. The decision tree below
is for manual overrides.
- Default / getting started →
local:AllMiniLML6V2Q— bundled, 22 MB, no setup. Already active out of the box; no config change needed. - Better code search, can build from source →
local:JinaEmbeddingsV2BaseCode(code-specific, 8192-token context, ~300 MB). Build with--features local-embedfrom the repository. Outperforms general-purpose models on code. - Ollama is already running →
ollama:nomic-embed-text(fast, 8192-token context, 137 MB). Upgrade toollama:bge-m3for higher retrieval quality at the cost of a 1.2 GB download. - Best search quality, cloud acceptable →
openai:text-embedding-3-small. - Air-gapped or full data privacy required →
local:JinaEmbeddingsV2BaseCodeorlocal:AllMiniLML6V2Q(already the default — no external calls are made). - Self-hosted TEI, vLLM, or similar → set
url = "http://your-server:port/v1"in[embeddings].