Semantic Search Tools

Semantic search lets you find code by meaning rather than by exact name or keyword. Instead of knowing what a function is called, you describe what it does — “retry with exponential backoff”, “authentication middleware”, “how errors are serialized to JSON” — and the tool finds the most relevant code chunks in the project.

The backend stores vector embeddings of your source code in a SQLite database at .codescout/embeddings.db. The embedding model is configurable (see Project Configuration); the default works with any OpenAI-compatible endpoint or a local Ollama server.

You must build the index before searching. Use index(action: build) once, then semantic_search as many times as you like. Incremental re-indexing is cheap: only files that changed since the last run are re-embedded.

See also: Semantic Search Concepts — how chunking, embedding, and scoring work; when to use semantic search vs symbol tools. Setup Guide — step-by-step configuration and indexing walkthrough.

`semantic_search`

Purpose: Find code by natural language description or code snippet. Returns ranked chunks with file path, line range, and similarity score.

Parameters:

Name	Type	Required	Default	Description
`query`	string	yes	—	Natural language description or code snippet to search for
`limit`	integer	no	`10`	Maximum number of results to return
`detail_level`	string	no	compact	`"full"` returns the complete chunk content instead of a 150-character preview
`offset`	integer	no	`0`	Skip this many results (for pagination)
`scope`	string	no	`"project"`	Search scope: `"project"` (default), `"lib:<name>"` for a specific library, `"libraries"` for all libraries, `"all"` for everything
`include_memories`	boolean	no	`false`	If true, also search semantic memories and include them in results tagged with `"source": "memory"`

Example:

{
  "query": "retry with exponential backoff",
  "limit": 5
}

Output (compact, default):

{
  "results": [
    {
      "file_path": "src/embed/remote.rs",
      "language": "rust",
      "content": "async fn with_retry<F, Fut, T>(mut f: F, max_attempts: u8) -> anyhow::Result<T>\nwhere\n    F: FnMut() -> Fut,...",
      "start_line": 42,
      "end_line": 68,
      "score": 0.91,
      "source": "project"
    },
    {
      "file_path": "src/util/http.rs",
      "language": "rust",
      "content": "/// Exponential back-off starting at 200ms, doubling each attempt up to...",
      "start_line": 12,
      "end_line": 30,
      "score": 0.84,
      "source": "project"
    }
  ],
  "total": 2
}

In compact mode, content is truncated to 150 characters followed by "...". Use detail_level: "full" to get complete chunk bodies.

Output (full detail):

{
  "query": "retry with exponential backoff",
  "limit": 5,
  "detail_level": "full"
}

The content field contains the full source text of each chunk. Combine with offset to page through results:

{
  "query": "retry with exponential backoff",
  "limit": 5,
  "detail_level": "full",
  "offset": 5
}

Tips:

Use semantic_search when you know the concept but not the exact function name. For example: “where is the JWT decoded”, “rate limiting logic”, “database connection pool initialization”.
Paste a code snippet as the query to find similar code elsewhere in the project. This is useful for spotting duplication or finding the canonical version of a pattern.
Scores above 0.85 are typically a strong match. Scores below 0.6 usually indicate the concept is not well represented in the index.
If results are poor, check workspace(action: status) to confirm the index is up to date, and index(action: build) to rebuild if files have changed.
For finding a symbol by name, symbols is faster and more precise. Semantic search is for concepts, not identifiers.

Workspace project scoping

{ "tool": "semantic_search", "arguments": { "query": "auth flow", "project": "frontend" } }

Omit project to search across the workspace-level context. See Multi-Project Workspaces for setup.

`index(action: build)`

Purpose: Build or incrementally update the semantic search index for the active project. Only re-embeds files whose content has changed since the last run unless force is set. Use index(action: build).

Parameters:

Name	Type	Required	Default	Description
`action`	string	yes	—	`"build"`
`force`	boolean	no	`false`	Force full reindex, ignoring cached file hashes
`scope`	string	no	`"project"`	What to index: `"project"` (default) for the active project, or `"lib:<name>"` to index a registered library.

Example (incremental update):

{ "action": "build" }

Example (full reindex):

{
  "action": "build",
  "force": true
}

Output:

{
  "status": "ok",
  "files_indexed": 3,
  "files_deleted": 0,
  "detail": "3 deleted",
  "total_files": 47,
  "total_chunks": 312
}

When drift detection is enabled (on by default) and files had meaningful semantic changes, a drift_summary field is included with the top-5 most-drifted files:

{
  "status": "ok",
  "files_indexed": 3,
  "total_files": 47,
  "total_chunks": 312,
  "drift_summary": [
    { "file": "src/auth/service.rs", "avg_drift": "0.72", "max_drift": "0.91", "added": 2, "removed": 1 }
  ]
}

Staleness warning — if semantic_search is called when the index is behind the current HEAD commit, results include:

{ "stale": true, "behind_commits": 3, "hint": "Index is behind HEAD. Run index(action: build) to update." }

Tips:

Run index(action: build) once when you first activate a project, then again after large refactors or when many files have changed.
The incremental mode (default) uses a git diff → mtime → SHA-256 fallback chain. It is safe to run frequently — unchanged files are skipped at negligible cost.
Use force: true if you have changed the embedding model in project.toml. Changing the model produces incompatible vectors, so a full reindex is required.
Indexing runs synchronously. For large projects (thousands of files), this may take a few minutes the first time.

`index(action: status)`

Purpose: Show the health of the semantic index — file count, chunk count, embedding model, last update time, and optional per-file drift scores. Use index(action: status).

Parameters:

Name	Type	Required	Default	Description
`action`	string	yes	—	`"status"`
`threshold`	float	no	—	When set, include drift scores for files whose `avg_drift` exceeds this value (0.0–1.0). Higher = more changed.
`path`	string	no	—	Limit drift reporting to a specific file or directory.

Example (basic stats):

{ "action": "status" }

Example (drift scores for significantly changed files):

{ "action": "status", "threshold": 0.3 }

Output:

{
  "indexed_files": 47,
  "total_chunks": 312,
  "model": "ollama:nomic-embed-text",
  "last_updated": "2026-03-12T10:14:00Z",
  "stale": false,
  "drift": [
    { "file": "src/auth/service.rs", "avg_drift": "0.72", "max_drift": "0.91" }
  ]
}

Opt out of drift detection with drift_detection_enabled = false in .codescout/project.toml.

See also: Dashboard — the Overview page surfaces index staleness and per-file drift scores visually, without a tool call.

Keyboard shortcuts

codescout Manual

Semantic Search Tools

semantic_search

Workspace project scoping

index(action: build)

index(action: status)

`semantic_search`

`index(action: build)`

`index(action: status)`