Semantic Search Guide

Semantic search lets you find code by describing what it does rather than knowing what it is called. This page walks you through the full setup from choosing a backend to writing effective queries. For a reference of the individual tools, see Semantic Search Tools.

For an explanation of how semantic search works under the hood — chunking, scoring, and when to use it vs symbol tools — see Semantic Search Concepts.

Choosing an Embedding Backend

codescout supports four embedding backends. The model string prefix in project.toml selects which one is used:

Prefix	Example	When to use
`ollama:`	`ollama:mxbai-embed-large`	Local development — free, private, no API key
`openai:`	`openai:text-embedding-3-small`	Best retrieval quality, cloud cost
`custom:`	`custom:my-model@http://host:8080`	Any OpenAI-compatible endpoint
`local:`	`local:AllMiniLML6V2Q`	Offline / air-gapped, no daemon required

Recommended starting point: The bundled local:AllMiniLML6V2Q model — no setup required, works offline, and downloads only ~22 MB on first use. For higher search quality or multi-project setups, see Embedding Backends.

Setting Up Ollama

Install Ollama, pull the default model, and verify it responds correctly before touching project.toml.

# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.com/install.sh | sh

# Pull the default model
ollama pull mxbai-embed-large

# Verify the embedding endpoint is responding
curl http://localhost:11434/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "mxbai-embed-large", "input": "test"}'

A successful response looks like:

{
  "object": "list",
  "data": [{ "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] }],
  "model": "mxbai-embed-large"
}

If curl returns a connection error, Ollama is not running. Start it with ollama serve in a separate terminal and retry.

If you run Ollama on a non-default host or port, set the OLLAMA_HOST environment variable before starting Claude Code:

export OLLAMA_HOST=http://192.168.1.10:11434

Configuring codescout

The [embeddings] section of .codescout/project.toml controls which model is used and how files are chunked. The defaults work well for most projects:

[embeddings]
model = "local:AllMiniLML6V2Q"

model is the only setting you need to change. Chunk size is derived automatically from the model’s context window — no manual tuning required.

To use OpenAI instead, set the model and export your API key:

[embeddings]
model = "openai:text-embedding-3-small"

export OPENAI_API_KEY=sk-...

Building the Index

Once project.toml is configured (or using the default), build the index:

{ "name": "index", "arguments": { "action": "build" } }

What happens internally:

codescout walks the project tree, skipping directories listed in ignored_paths (by default: .git, node_modules, target, __pycache__, .venv, dist, build, .codescout).
Each source file is split into chunks using an AST-aware chunker. Each top-level function, method, or class becomes its own chunk. Oversized containers (impl blocks, classes) are recursively split into one chunk per inner method plus a header chunk for the container signature. Chunk size is derived from the model’s context window — no configuration needed.
Each chunk is sent to the configured embedding backend, which returns a dense vector.
The vectors and chunk metadata are stored in .codescout/embeddings.db (SQLite).

How long it takes: With Ollama on a modern laptop, expect roughly 80–120 files per minute. OpenAI’s API is faster in wall-clock time because requests are batched and network latency is low — typically 3–5x faster for large projects. A 10,000-line project usually indexes in under two minutes with either backend.

Incremental updates: Running index(action: build) again after editing a few files is cheap. codescout hashes each file’s content and only re-embeds files whose hash has changed since the last run. Unchanged files are skipped at negligible cost.

Force reindex: Use force: true when you change the model in project.toml. Vectors from different models are not comparable, so the entire index must be rebuilt:

{ "name": "index", "arguments": { "action": "build", "force": true } }

You can check index health at any time:

{ "name": "workspace", "arguments": { "action": "status" } }

The output shows config.embeddings.model (from project.toml) and the index.model (what was used to build the current index). If they differ, a force reindex is needed.

Searching Effectively

Natural Language Queries

Describe what the code does in plain language. You do not need to know the function name or file location:

{ "name": "semantic_search", "arguments": { "query": "how errors are handled" } }

{ "name": "semantic_search", "arguments": { "query": "database connection setup" } }

{ "name": "semantic_search", "arguments": { "query": "authentication token validation" } }

Concrete, specific queries outperform vague ones. Prefer “retry logic with exponential backoff” over “error handling”. Prefer “connection pool initialization” over “database”.

Code Snippet Queries

Paste a function signature or a short snippet as the query to find similar code elsewhere in the project. This is useful for spotting duplication or locating the canonical version of a pattern:

{
  "name": "semantic_search",
  "arguments": {
    "query": "fn connect(host: &str, port: u16) -> Result<Connection>"
  }
}

Interpreting Scores

Each result includes a score between 0 and 1 (cosine similarity):

Score range	Interpretation
> 0.85	Strong match — the chunk directly addresses your query
0.6 – 0.85	Related — the concept is present but may not be the primary focus
< 0.6	Tangential — treat as background context at best

The top result is not always the most useful one. Scan the top five results before drilling into any single chunk.

Recommended Workflow

Semantic search is the entry point for concept-first exploration. After finding relevant chunks, use the symbol tools to navigate the surrounding code:

semantic_search — find the files and line ranges where a concept lives.
symbols on those files — see the surrounding structure.
symbols with include_body: true — read the exact implementation.
references — trace callers if needed.

Tuning

Chunk Size

Chunk size is not configurable — it is derived automatically from the model’s published context window using the formula:

chunk_size = max_tokens × 0.85 × 3 chars/token

The 0.85 factor leaves headroom for tokenisation variance; 3 chars/token is a conservative lower bound for mixed code and prose. Representative values:

Model	Context	Chunk budget
`ollama:mxbai-embed-large`	512 tokens	~1 300 chars
`ollama:nomic-embed-text`	8 192 tokens	~20 900 chars
`openai:text-embedding-3-small`	8 191 tokens	~20 900 chars
`local:JinaEmbeddingsV2BaseCode`	8 192 tokens	~20 900 chars
`local:AllMiniLML6V2Q`	512 tokens	~1 300 chars
`local:AllMiniLML6V2Q`	256 tokens	~650 chars

Because AST chunking splits at function/method boundaries rather than at character counts, most chunks are well within the budget regardless of model. The budget mainly controls when a single oversized node is recursively split into inner methods.

Model Choice

The embedding model has the largest effect on search quality. General-purpose text models (nomic-embed-text, text-embedding-3-small) work well for documentation and comments. Code-specific models (local:JinaEmbeddingsV2BaseCode) tend to perform better on function signatures and code identifiers.

After changing the model, always run index(action: build) with force: true.

Troubleshooting

“No results” or empty results list

The index may not be built yet. Run workspace(action: status) to check. If index.indexed is false, run index(action: build). If the index exists but results are empty, the query may be too generic — try a more specific description.

“Connection refused” when indexing

An external embedding server (Ollama, llama.cpp, etc.) is not running. Start it, or switch to the bundled model by setting model = "local:AllMiniLML6V2Q" in .codescout/project.toml.

“Model not found” error

The model has not been pulled. For Ollama, run ollama pull <model-name> (or whatever model is configured) and retry.

Stale results after editing many files

Run index(action: build) without extra arguments. The incremental update will re-embed only the files that changed.

Results seem wrong after changing the model

The index was built with a different model and the vectors are no longer compatible. Run index(action: build) with force: true. You can confirm the mismatch by checking workspace(action: status): if the config model and the index model differ, a force reindex is required.

Indexing is very slow

If using an external server (Ollama, llama.cpp), check it is running locally and not routing over a slow network connection. The bundled local:AllMiniLML6V2Q model runs in-process and avoids network overhead. For the fastest throughput on large projects, openai:text-embedding-3-small batches requests via API.

Keyboard shortcuts

codescout Manual