Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Output Buffers

The Problem

MCP tools return their results directly into the AI’s context window. For large command output — a full cargo test run, a broad grep, a 2000-line file — that means the entire output lands in context whether the AI needs all of it or not. The result is a bloated context, wasted tokens, and an AI that has to skim walls of text to find what it actually needs.

How It Works

When run_command or read_file produces output above a size threshold, codescout stores the full content in an in-memory buffer and returns a compact summary + an @id handle instead:

run_command("cargo test")
→ {
    "summary": "47 passed, 2 failed — FAILED: test_parse, test_render",
    "output_id": "@cmd_a1b2c3",
    "exit_code": 1
  }

The full output is held in memory, keyed by the @id. The AI can then query it with targeted follow-up run_command calls using standard Unix tools:

run_command("grep FAILED @cmd_a1b2c3")
run_command("sed -n '42,80p' @cmd_a1b2c3")
run_command("grep -A5 'thread.*panicked' @cmd_a1b2c3")

File reads work the same way — large files become @file_id references:

read_file("src/main.rs")
→ { "summary": "...", "file_id": "@file_abc456" }

run_command("grep 'fn.*async' @file_abc456")

Refs compose freely. You can diff two buffers, pipe one through awk, or pass a @file_id to grep alongside a pattern from a @cmd_id:

run_command("diff @cmd_a1b2c3 @cmd_d4e5f6")
run_command("grep -F -f @file_abc456 @cmd_a1b2c3")

Why It Matters

Short output is always returned inline. Only responses above the threshold get buffered. The AI never has to think about whether to use @refs — the tool handles the routing automatically.

Each buffer query shows up as a distinct tool call in Claude Code’s UI. Instead of one undifferentiated wall of text, the user sees the AI making targeted, reviewable queries — grep FAILED, then sed -n '42,80p', then grep -A5 'panicked'. The exploration is transparent and auditable.

When a buffer query still returns too much, you get 100 lines inline. If grep @ref or jq @tool_ref produces more than 100 lines, codescout returns the first 100 lines inline with truncation metadata rather than creating another @ref handle (which would cause an infinite loop). The response includes truncated: true, stdout_shown/stdout_total (and stderr_shown/stderr_total when stderr is non-empty) so the AI can decide whether to refine further. Stderr is prioritised — up to 20 stderr lines are shown, with the remaining budget going to stdout.

The context window stays lean. The AI holds a reference to large output without paying the token cost of the full content. It pays only for what it actually reads.

Buffers survive across multiple turns. A @cmd_id from a cargo test run can be queried again later in the same session — no need to re-run the command to look at a different part of the output.

Buffer Lifecycle

Buffers are held in memory for the lifetime of the MCP server process. They use an LRU eviction policy: when the buffer store fills up (default: 50 entries), the least-recently-accessed entry is dropped. Accessing a buffer (even to query it) refreshes its position in the eviction order.

Buffers are not persisted to disk. Restarting the server clears them.

Further Reading

  • Shell Integrationrun_command in full detail: safety layer, dangerous command detection, and source file access blocking
  • Workflow & Config Tools — full reference for run_command including the cwd, acknowledge_risk, and timeout_secs parameters