Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Semantic Search Tools

Semantic search lets you find code by meaning rather than by exact name or keyword. Instead of knowing what a function is called, you describe what it does — “retry with exponential backoff”, “authentication middleware”, “how errors are serialized to JSON” — and the tool finds the most relevant code chunks in the project.

The backend stores vector embeddings of your source code in a SQLite database at .codescout/embeddings.db. The embedding model is configurable (see Project Configuration); the default works with any OpenAI-compatible endpoint or a local Ollama server.

You must build the index before searching. Use index(action: build) once, then semantic_search as many times as you like. Incremental re-indexing is cheap: only files that changed since the last run are re-embedded.

See also: Semantic Search Concepts — how chunking, embedding, and scoring work; when to use semantic search vs symbol tools. Setup Guide — step-by-step configuration and indexing walkthrough.


Purpose: Find code by natural language description or code snippet. Returns ranked chunks with file path, line range, and similarity score.

Parameters:

NameTypeRequiredDefaultDescription
querystringyesNatural language description or code snippet to search for
limitintegerno10Maximum number of results to return
detail_levelstringnocompact"full" returns the complete chunk content instead of a 150-character preview
offsetintegerno0Skip this many results (for pagination)
scopestringno"project"Search scope: "project" (default), "lib:<name>" for a specific library, "libraries" for all libraries, "all" for everything
include_memoriesbooleannofalseIf true, also search semantic memories and include them in results tagged with "source": "memory"

Example:

{
  "query": "retry with exponential backoff",
  "limit": 5
}

Output (compact, default):

{
  "results": [
    {
      "file_path": "src/embed/remote.rs",
      "language": "rust",
      "content": "async fn with_retry<F, Fut, T>(mut f: F, max_attempts: u8) -> anyhow::Result<T>\nwhere\n    F: FnMut() -> Fut,...",
      "start_line": 42,
      "end_line": 68,
      "score": 0.91,
      "source": "project"
    },
    {
      "file_path": "src/util/http.rs",
      "language": "rust",
      "content": "/// Exponential back-off starting at 200ms, doubling each attempt up to...",
      "start_line": 12,
      "end_line": 30,
      "score": 0.84,
      "source": "project"
    }
  ],
  "total": 2
}

In compact mode, content is truncated to 150 characters followed by "...". Use detail_level: "full" to get complete chunk bodies.

Output (full detail):

{
  "query": "retry with exponential backoff",
  "limit": 5,
  "detail_level": "full"
}

The content field contains the full source text of each chunk. Combine with offset to page through results:

{
  "query": "retry with exponential backoff",
  "limit": 5,
  "detail_level": "full",
  "offset": 5
}

Tips:

  • Use semantic_search when you know the concept but not the exact function name. For example: “where is the JWT decoded”, “rate limiting logic”, “database connection pool initialization”.
  • Paste a code snippet as the query to find similar code elsewhere in the project. This is useful for spotting duplication or finding the canonical version of a pattern.
  • Scores above 0.85 are typically a strong match. Scores below 0.6 usually indicate the concept is not well represented in the index.
  • If results are poor, check workspace(action: status) to confirm the index is up to date, and index(action: build) to rebuild if files have changed.
  • For finding a symbol by name, symbols is faster and more precise. Semantic search is for concepts, not identifiers.

Workspace project scoping

{ "tool": "semantic_search", "arguments": { "query": "auth flow", "project": "frontend" } }

Omit project to search across the workspace-level context. See Multi-Project Workspaces for setup.


index(action: build)

Purpose: Build or incrementally update the semantic search index for the active project. Only re-embeds files whose content has changed since the last run unless force is set. Use index(action: build).

Parameters:

NameTypeRequiredDefaultDescription
actionstringyes"build"
forcebooleannofalseForce full reindex, ignoring cached file hashes
scopestringno"project"What to index: "project" (default) for the active project, or "lib:<name>" to index a registered library.

Example (incremental update):

{ "action": "build" }

Example (full reindex):

{
  "action": "build",
  "force": true
}

Output:

{
  "status": "ok",
  "files_indexed": 3,
  "files_deleted": 0,
  "detail": "3 deleted",
  "total_files": 47,
  "total_chunks": 312
}

When drift detection is enabled (on by default) and files had meaningful semantic changes, a drift_summary field is included with the top-5 most-drifted files:

{
  "status": "ok",
  "files_indexed": 3,
  "total_files": 47,
  "total_chunks": 312,
  "drift_summary": [
    { "file": "src/auth/service.rs", "avg_drift": "0.72", "max_drift": "0.91", "added": 2, "removed": 1 }
  ]
}

Staleness warning — if semantic_search is called when the index is behind the current HEAD commit, results include:

{ "stale": true, "behind_commits": 3, "hint": "Index is behind HEAD. Run index(action: build) to update." }

Tips:

  • Run index(action: build) once when you first activate a project, then again after large refactors or when many files have changed.
  • The incremental mode (default) uses a git diff → mtime → SHA-256 fallback chain. It is safe to run frequently — unchanged files are skipped at negligible cost.
  • Use force: true if you have changed the embedding model in project.toml. Changing the model produces incompatible vectors, so a full reindex is required.
  • Indexing runs synchronously. For large projects (thousands of files), this may take a few minutes the first time.

index(action: status)

Purpose: Show the health of the semantic index — file count, chunk count, embedding model, last update time, and optional per-file drift scores. Use index(action: status).

Parameters:

NameTypeRequiredDefaultDescription
actionstringyes"status"
thresholdfloatnoWhen set, include drift scores for files whose avg_drift exceeds this value (0.0–1.0). Higher = more changed.
pathstringnoLimit drift reporting to a specific file or directory.

Example (basic stats):

{ "action": "status" }

Example (drift scores for significantly changed files):

{ "action": "status", "threshold": 0.3 }

Output:

{
  "indexed_files": 47,
  "total_chunks": 312,
  "model": "ollama:nomic-embed-text",
  "last_updated": "2026-03-12T10:14:00Z",
  "stale": false,
  "drift": [
    { "file": "src/auth/service.rs", "avg_drift": "0.72", "max_drift": "0.91" }
  ]
}

Opt out of drift detection with drift_detection_enabled = false in .codescout/project.toml.

See also: Dashboard — the Overview page surfaces index staleness and per-file drift scores visually, without a tool call.