Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Embedding Backends

Semantic search requires converting source code into vector embeddings. codescout supports four backends, selected at runtime by the model field in [embeddings] inside .codescout/project.toml. The prefix before the colon determines which backend is used.

[embeddings]
model = "ollama:nomic-embed-text"   # recommended when Ollama is available

The onboarding tool detects your hardware at setup time and writes the best model for your machine into this field automatically. You rarely need to set it manually — see Choosing a Backend if you want to override it.


Backend Comparison

BackendSpeedQualityCostPrivacySetup
Local (fastembed, default)Fast (CPU)Good–ExcellentFreeFully localBundled — no setup needed
OllamaMediumGoodFreeLocalInstall Ollama + pull model
OpenAIFast (network)ExcellentPay-per-tokenData sent to OpenAISet OPENAI_API_KEY
Custom endpointVariesVariesVariesDepends on hostPoint at any compatible server

onboarding picks the best model for your machine automatically. This table is a reference for manual overrides or when comparing options.

Model stringBackendDimsContextCode qualityNotes
local:AllMiniLML6V2Qfastembed384256 tokGoodDefault. 22 MB, zero-config, bundled.
local:JinaEmbeddingsV2BaseCodefastembed7688192 tokExcellentRecommended (CPU-only). Code-specific.
ollama:nomic-embed-textOllama7688192 tokGoodRecommended if Ollama is already running.
ollama:bge-m3Ollama10248192 tokExcellentBest Ollama quality; slower, ~1.2 GB.
openai:text-embedding-3-smallOpenAI1536ExcellentBest quality/cost if cloud is acceptable.
openai:text-embedding-3-largeOpenAI3072BestOverkill for most codebases.
ollama:mxbai-embed-largeOllama1024512 tokGoodLegacy. Short context truncates most code.

Switching models requires a full reindex — see Rebuilding After a Model Change below. Scores are not comparable across models; a score of 0.75 means different things with different models.


Ollama

Uses a locally running Ollama daemon. No API key is required.

Model string format: "ollama:<model-name>"

Endpoint: $OLLAMA_HOST/v1/embeddings (default: http://localhost:11434/v1/embeddings)

Setup

  1. Install Ollama from ollama.com.

  2. Pull the embedding model:

    ollama pull mxbai-embed-large
    
  3. Make sure the daemon is running:

    ollama serve
    

Configuration

[embeddings]
model = "ollama:mxbai-embed-large"

To use a different Ollama host (e.g. a remote machine or a custom port), set the OLLAMA_HOST environment variable before starting the MCP server:

export OLLAMA_HOST=http://192.168.1.50:11434

Automatic CPU Fallback

If Ollama is not running when codescout tries to connect, you’ll see a clear error:

Ollama is not reachable at http://localhost:11434

Options:

  • Start Ollama: ollama serve
  • Switch to bundled ONNX: set model = "local:AllMiniLML6V2Q" in [embeddings]
  • Use a different server: set url = "http://your-server:port/v1" in [embeddings]
ModelDimensionsContextNotes
nomic-embed-text7688192 tokRecommended default. Fast indexing, 137 MB.
bge-m310248192 tokBest retrieval quality; ~1.2 GB download.
mxbai-embed-large1024512 tokLegacy; short context truncates most functions.

OpenAI

Calls the OpenAI embeddings API. Requires an active OpenAI account and an API key.

Model string format: "openai:<model-name>"

Endpoint: https://api.openai.com/v1/embeddings

Authentication: $OPENAI_API_KEY environment variable (required)

Setup

Set your API key in the environment before starting the MCP server:

export OPENAI_API_KEY=sk-...

Configuration

[embeddings]
model = "openai:text-embedding-3-small"
ModelDimensionsNotes
text-embedding-3-small1536Low cost, good quality, recommended
text-embedding-3-large3072Highest quality, higher cost
text-embedding-ada-0021536Legacy model, still widely used

Custom Endpoint

Points at any OpenAI-compatible embeddings API — useful for self-hosted models, Azure OpenAI, Together AI, or other third-party providers.

Model string format: "custom:<model-name>@<base-url>"

codescout appends /v1/embeddings to <base-url>, so a base URL of http://localhost:1234 becomes http://localhost:1234/v1/embeddings.

Authentication: $EMBED_API_KEY environment variable (optional — set it if the server requires a bearer token)

Setup

Start your compatible server, then set the API key if needed:

export EMBED_API_KEY=your-token-here

Configuration

[embeddings]
model = "custom:mxbai-embed-large@http://localhost:1234"

Examples for common providers:

# Azure OpenAI
model = "custom:text-embedding-3-small@https://my-resource.openai.azure.com/openai/deployments/my-deployment"

# Together AI
model = "custom:togethercomputer/m2-bert-80M-8k-retrieval@https://api.together.xyz"

# Hugging Face Text Embeddings Inference (TEI)
model = "custom:BAAI/bge-large-en-v1.5@http://localhost:8080"

Local (fastembed)

Runs entirely on-device using fastembed-rs and ONNX Runtime. No external daemon, no API key, and no network traffic after the initial model download.

Model string format: "local:<EmbeddingModel-variant>"

Requires: building from source with the local-embed Cargo feature. This backend is not available in the published cargo install codescout binary because ONNX Runtime is a native system library that cannot be bundled through crates.io.

Looking for free local embeddings without building from source? Use Ollama instead — it is the recommended path for most users.

Model cache: ~/.cache/huggingface/hub/ — downloaded on first use, then fully offline

Build

Clone the repository and build with local embedding support:

git clone https://github.com/mareurs/codescout.git
cd codescout
cargo install --path . --features local-embed

Or, to have both local and remote backends available simultaneously:

cargo install --path . --features remote-embed,local-embed

Configuration

[embeddings]
model = "local:AllMiniLML6V2Q"

Supported Local Models

Model stringDimensionsDownload sizeNotes
local:JinaEmbeddingsV2BaseCode768~300 MBCode-specific, highest quality for source code
local:AllMiniLML6V2Q384~22 MBINT8-quantized, CPU-safe, recommended for most users
local:BGESmallENV15Q384~20 MBGPU-optimized export; may fail on CPU-only machines
local:BGESmallENV15384~65 MBFull f32 precision variant of BGESmallENV15Q
local:AllMiniLML6V2384~90 MBFull f32 precision variant of AllMiniLML6V2Q

For most local setups, AllMiniLML6V2Q gives the best tradeoff: small download, CPU-safe inference, and solid retrieval quality. Use JinaEmbeddingsV2BaseCode when search quality on code is the priority and the larger download is acceptable.

Error When Feature Is Missing

If you try to use a local: model without the local-embed feature compiled in, you will see an error like:

Local embedding requires the 'local-embed' feature.
Rebuild with: cargo build --features local-embed

Batching

All backends send texts in batches of 8. This avoids HTTP 400 errors from servers that have payload size limits (Ollama is particularly strict about this) and keeps per-request latency manageable. The batch size is fixed and not configurable.


Rebuilding After a Model Change

The embedding index stores which model was used to build it. If you change the model field in project.toml, you must rebuild the index with the force flag to avoid mixing vectors from different models:

{ "name": "index(action: build)", "arguments": { "force": true } }

codescout will warn if it detects a mismatch between the configured model and the model recorded in the existing index.


Choosing a Backend

In most cases you do not need to chooseonboarding probes your hardware and writes the recommended model into .codescout/project.toml automatically. The decision tree below is for manual overrides.

  • Default / getting startedlocal:AllMiniLML6V2Q — bundled, 22 MB, no setup. Already active out of the box; no config change needed.
  • Better code search, can build from sourcelocal:JinaEmbeddingsV2BaseCode (code-specific, 8192-token context, ~300 MB). Build with --features local-embed from the repository. Outperforms general-purpose models on code.
  • Ollama is already runningollama:nomic-embed-text (fast, 8192-token context, 137 MB). Upgrade to ollama:bge-m3 for higher retrieval quality at the cost of a 1.2 GB download.
  • Best search quality, cloud acceptableopenai:text-embedding-3-small.
  • Air-gapped or full data privacy requiredlocal:JinaEmbeddingsV2BaseCode or local:AllMiniLML6V2Q (already the default — no external calls are made).
  • Self-hosted TEI, vLLM, or similar → set url = "http://your-server:port/v1" in [embeddings].