Output Buffers

The Problem

MCP tools return their results directly into the AI’s context window. For large command output — a full cargo test run, a broad grep, a 2000-line file — that means the entire output lands in context whether the AI needs all of it or not. The result is a bloated context, wasted tokens, and an AI that has to skim walls of text to find what it actually needs.

How It Works

When run_command or read_file produces output above a size threshold, codescout stores the full content in an in-memory buffer and returns a compact summary + an @id handle instead:

run_command("cargo test")
→ {
    "summary": "47 passed, 2 failed — FAILED: test_parse, test_render",
    "output_id": "@cmd_a1b2c3",
    "exit_code": 1
  }

The full output is held in memory, keyed by the @id. The AI can then query it with targeted follow-up run_command calls using standard Unix tools:

run_command("grep FAILED @cmd_a1b2c3")
run_command("sed -n '42,80p' @cmd_a1b2c3")
run_command("grep -A5 'thread.*panicked' @cmd_a1b2c3")

File reads work the same way — large files become @file_id references:

read_file("src/main.rs")
→ { "summary": "...", "file_id": "@file_abc456" }

run_command("grep 'fn.*async' @file_abc456")

Refs compose freely. You can diff two buffers, pipe one through awk, or pass a @file_id to grep alongside a pattern from a @cmd_id:

run_command("diff @cmd_a1b2c3 @cmd_d4e5f6")
run_command("grep -F -f @file_abc456 @cmd_a1b2c3")

Why It Matters

Short output is always returned inline. Only responses above the threshold get buffered. The AI never has to think about whether to use @refs — the tool handles the routing automatically.

Each buffer query shows up as a distinct tool call in Claude Code’s UI. Instead of one undifferentiated wall of text, the user sees the AI making targeted, reviewable queries — grep FAILED, then sed -n '42,80p', then grep -A5 'panicked'. The exploration is transparent and auditable.

When a buffer query still returns too much, you get 100 lines inline. If grep @ref or jq @tool_ref produces more than 100 lines, codescout returns the first 100 lines inline with truncation metadata rather than creating another @ref handle (which would cause an infinite loop). The response includes truncated: true, stdout_shown/stdout_total (and stderr_shown/stderr_total when stderr is non-empty) so the AI can decide whether to refine further. Stderr is prioritised — up to 20 stderr lines are shown, with the remaining budget going to stdout.

The context window stays lean. The AI holds a reference to large output without paying the token cost of the full content. It pays only for what it actually reads.

Buffers survive across multiple turns. A @cmd_id from a cargo test run can be queried again later in the same session — no need to re-run the command to look at a different part of the output.

Buffer Lifecycle

Buffers are held in memory for the lifetime of the MCP server process. They use an LRU eviction policy: when the buffer store fills up (default: 50 entries), the least-recently-accessed entry is dropped. Accessing a buffer (even to query it) refreshes its position in the eviction order.

Buffers are not persisted to disk. Restarting the server clears them.

codescout Manual

Output Buffers

The Problem

How It Works

Why It Matters

Buffer Lifecycle

Further Reading

Keyboard shortcuts

codescout Manual

Output Buffers

The Problem

How It Works

Why It Matters

Buffer Lifecycle

Further Reading