Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Embedding Model Comparison

Which embedding model should you use with codescout? This page summarizes benchmark results and real-world usage data to help you choose.

This comparison is based on the codescout codebase (417 files, ~32K chunks) as of 2026-04-03. Results may vary with different codebases. We will update this page as we collect more real-world data.

Models Tested

ModelDimsContextSizeBackendSetup
local:AllMiniLML6V2Q384256 tok22 MBBundled ONNX (CPU)None — works out of the box
nomic-embed-text7688,192 tok274 MBOllamaollama pull nomic-embed-text
nomic-embed-code (Q4_K_M)358432,768 tok4.1 GBllama.cpp (GPU)Download GGUF + start server

Benchmark Results

We tested 20 queries across 4 complexity tiers, scoring each 0-3 based on whether the expected source files appeared in the top 10 results.

Overall Scores (max 60)

nomic-embed-code ████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  36/60
AllMiniLML6V2Q   ██████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  34/60
nomic-embed-text ████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  32/60

By Complexity Tier

TierWhat it testsBest modelScore
1. Direct Concept (5 queries)Single named type, module, or featurenomic-embed-text12/15
2. Two-Concept (7 queries)Relationship between two conceptsnomic-embed-code17/21
3. Cross-Cutting (5 queries)Three+ concepts, architectural flowsnomic-embed-code7/15
4. Architectural (3 queries)Design invariants, consistency patternsAllMiniLML6V2Q5/9

No single model dominates all tiers.

Practical Metrics

MetricAllMiniLML6V2Qnomic-embed-textnomic-embed-code
Index time (417 files)70 seconds60 seconds25 minutes
DB size71 MB55 MB372 MB
Chunk count32,09811,88711,868
RequiresNothingOllama runningGPU + llama.cpp server

Analysis of 31,674 tool calls across 70+ real projects:

  • symbols17.8% of all calls (the workhorse)
  • grep2.3%
  • semantic_search1.1% (349 calls total)

Agents use semantic search as a last resort — when they don’t know the exact name of what they’re looking for. The typical query is a short 3-6 word concept phrase:

"error handling and recovery from tool failures"
"embedding index build and incremental update"
"security path validation and write access control"
"ollama embedding configuration"
"intent classifier ONNX model prediction"

These are mostly Tier 1-2 queries (direct concept or two-concept composition). Tier 3-4 queries (complex architectural questions) are rare in organic usage.

Recommendation

Use the default: local:AllMiniLML6V2Q.

FactorWhy the default wins
Score34/60 — within 2 points of the best model (36/60)
Speed70 seconds vs 25 minutes — 21x faster indexing
SetupZero. No Ollama, no GPU, no server to manage
Storage71 MB — reasonable for any machine
PrecisionBest at Tier 4 (finding specific functions and patterns) — matches how agents actually query

The 7B code-specialized model’s 2-point advantage doesn’t justify 21x slower indexing, 5x more storage, and a GPU requirement. For the 1.1% of calls that reach semantic search, the bundled model is good enough.

When to consider alternatives

  • nomic-embed-text via Ollama — if Ollama is already running for other tasks, add url = "http://localhost:11434/v1" and model = "nomic-embed-text" for slightly better Tier 1 results at the same speed. Smallest storage footprint (55 MB).

  • nomic-embed-code via llama.cpp — if you have a GPU and primarily use semantic search for concept-level exploration (architecture questions, onboarding to a new codebase). Best at Tier 2-3 queries.

Methodology

Full benchmark details, per-query scores, and test case definitions are in docs/research/2026-04-03-embedding-model-benchmark.md.

The benchmark will be updated as we collect more real-world query data via the --debug flag’s usage traceability feature (see Debug Mode).