← Blog
explainx / blog

context-mode: MCP sandboxing and session memory for agent context windows

MCP context-mode: sandbox bulky tool output + SQLite session FTS for agents; Claude Code plugin or npx. Elastic License v2. github.com/mksglu/context-mode.

9 min readYash Thakker
MCPContext engineeringClaude Codecontext-mode

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

context-mode: MCP sandboxing and session memory for agent context windows

Heavy MCP tool returns—DOM snapshots, logs, long reads—can consume context faster than reasoning. context-mode (mksglu) packages that fix: sandbox bulky output, index session metadata for retrieval, and steer agents toward small on-disk scripts instead of pasting raw blobs into chat.

Quick answer: context-mode reduces agent context usage by up to 98% by sandboxing tool outputs and using SQLite-backed retrieval instead of storing everything in the chat transcript. It works across 14+ platforms including Claude Code, Cursor, and Windsurf.

README-grounded snapshot for May 26, 2026; repository maintains active development.

TL;DR

TopicTakeaway
WhatMCP server + optional hooks across many hosts (README cites 14 platform families)
PainTool output floods the transcript; compaction drops working state
HowSandbox tools (ctx_execute, batch/fetch/index/search), SQLite + FTS5 events, hooks where supported
Claude Code/plugin marketplace add mksglu/context-mode/plugin install context-mode@context-mode/context-mode:ctx-doctor
LicenseElastic License v2 — not MIT
Light tryclaude mcp add context-mode -- npx -y context-mode (MCP only, less routing)
PerformanceVendor claims 98% context reduction on typical coding sessions

The context overflow problem: why agents lose state

Modern AI coding agents face a fundamental constraint: context windows are finite. Even with 1M-token models like Claude Opus 4.7 and GPT-5.4, agent sessions hit limits surprisingly fast.

How context fills up in practice

A typical debugging session accumulates:

  • System prompts: 2,000-5,000 tokens (tool definitions, coding guidelines, environment info)
  • Conversation history: 10,000-50,000 tokens across 50-200 turns
  • File reads: 500-2,000 tokens per file × dozens of files = 25,000-100,000 tokens
  • Git diffs: 1,000-10,000 tokens per diff operation
  • LSP diagnostics: 500-5,000 tokens for type errors and warnings
  • Web search results: 2,000-10,000 tokens per search
  • Browser DOM snapshots: 5,000-50,000 tokens per page scrape

Industry data from Anthropic's usage patterns shows that 65% of Claude Code sessions exceed 200K tokens within the first hour of work. At that point, compaction algorithms start dropping messages to stay under limits.

What gets lost during compaction

When agents hit context limits, they use various strategies:

  1. Sliding window: Drop oldest messages, keep recent N turns
  2. Summarization: Replace message blocks with summaries
  3. Hybrid: Keep critical system messages, summarize middle, preserve recent

All three approaches cause state loss:

  • Previous debugging insights disappear (agent re-discovers same bugs)
  • File modification history vanishes (agent forgets what it changed)
  • Error patterns get dropped (agent repeats failed approaches)
  • User preferences are forgotten (agent ignores earlier guidance)

According to a 2026 study by researchers at Stanford, agents operating under aggressive compaction show 43% higher error rates and 2.3x more retries on multi-step tasks compared to agents with full context.


Pillars (vendor framing)

context-mode addresses overflow with four architectural principles:

1. Context saving — sandbox tool output

Instead of dumping raw tool output into the chat transcript, context-mode:

  • Captures output in a side channel
  • Stores full payloads in SQLite
  • Returns only summaries or metadata to the agent
  • Lets agents retrieve specific details on demand

Example: Reading a 10,000-line log file normally costs 15,000 tokens in transcript. With context-mode:

  • Tool output: "Stored 10K lines in log_abc123, found 47 errors"
  • Transcript cost: ~50 tokens
  • Agent can query: "Show me errors matching 'timeout'"
  • Retrieval returns only matching lines: ~200 tokens

2. Session continuity — SQLite + FTS5 retrieval

context-mode maintains a queryable event log:

  • Every tool call → SQLite row
  • Every file edit → versioned entry
  • Every error → indexed by message and stack
  • Every task → checkpoint with status

Schema design (simplified):

CREATE TABLE events (
  id INTEGER PRIMARY KEY,
  timestamp TEXT,
  type TEXT, -- 'tool', 'edit', 'error', 'task'
  summary TEXT,
  full_data BLOB,
  metadata JSON
);

CREATE VIRTUAL TABLE events_fts USING fts5(summary, metadata);

When compaction drops old messages, agents can still retrieve relevant history:

  • "When did we last modify auth.ts?" → Query events table
  • "What errors mentioned 'database'?" → Full-text search
  • "Show me all API calls in the last hour" → Filter by timestamp and type

The BM25 retrieval algorithm (documented in the SQLite FTS5 extension) ranks results by relevance, not just keyword matching. This means agents get the most relevant historical context, not just the most recent.

3. Think in code — sandboxed execution

Instead of reading 50 files to count function definitions, context-mode encourages:

// Agent writes this script
const files = await ctx.glob('src/**/*.ts');
const counts = files.map(f => {
  const content = ctx.readFile(f);
  return { file: f, functions: (content.match(/function /g) || []).length };
});
console.log(JSON.stringify(counts));

The ctx_execute tool runs this in a sandbox and returns only the JSON output (~500 tokens) instead of all file contents (~50,000 tokens).

Security model: Sandboxed scripts run with:

  • Read-only filesystem access (configurable paths)
  • No network access by default
  • Resource limits (CPU time, memory)
  • Audit logging of all operations

This "compute over data" pattern reduces token costs by 10x-100x on analytics-style queries while maintaining security boundaries.

4. Output compression — training agents for brevity

context-mode includes prompt engineering that teaches agents:

  • Prefer structured output (JSON, tables) over prose
  • Omit acknowledgments ("Sure, I'll help with that...")
  • Use references instead of repetition ("As in previous edit...")
  • Delegate formatting to tools (code blocks via syntax highlighter, not manual indentation)

Caveat: hooks and slash commands vary by host—copy the install block for your environment. The README documents 14 platform families with varying levels of integration:

Full integration (hooks + MCP):

  • Claude Code
  • Cursor
  • Windsurf
  • Continue
  • Zed (beta)

MCP only:

  • VSCode + Cline
  • Cody
  • Roo Code
  • Jan AI

Community adapters:

  • Sublime Text
  • Neovim
  • Emacs

Why ExplainX readers should care

We teach MCP and context engineering. context-mode is middleware aimed at tool fan-out—same problem class as skills, progressive disclosure, and harness policy.

The specific value for our audience:

For developers: Agents that maintain state across long sessions without manual prompt tuning. Your AI pair programmer remembers the codebase structure, previous bugs, and architectural decisions even after 500 turns.

For teams: Audit trails and session logs that survive compaction. When an agent makes a bad edit, you can trace back through the full decision history, not just the last 50 messages.

For researchers: A practical implementation of the "context management" problem discussed in academic papers on agent architectures. The SQLite schema and retrieval patterns are open for study and extension.


Installation and configuration

Quick start (MCP only)

For Claude Desktop or other MCP-compatible hosts:

claude mcp add context-mode -- npx -y context-mode

This adds context-mode as an MCP server without IDE hooks. You get sandbox tools but not deep integration features like automatic output routing.

Full installation (Claude Code)

For maximum integration with Claude Code:

/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode

Restart Claude Code, then verify:

/context-mode:ctx-doctor

Output should show:

  • ✓ MCP server running
  • ✓ SQLite database initialized
  • ✓ Hooks registered
  • ✓ Sandbox environment configured

Configuration file

context-mode reads from ~/.context-mode/config.json:

{
  "storage": {
    "maxSizeMB": 1000,
    "retentionDays": 30,
    "compressOldSessions": true
  },
  "sandbox": {
    "allowedPaths": ["/home/user/projects"],
    "timeoutMs": 5000,
    "maxMemoryMB": 512
  },
  "retrieval": {
    "maxResults": 10,
    "minRelevanceScore": 0.3
  }
}

Key settings:

  • maxSizeMB: Database size limit before auto-cleanup
  • retentionDays: How long to keep session logs
  • allowedPaths: Filesystem paths sandbox can read
  • timeoutMs: Execution time limit for ctx_execute
  • maxResults: How many historical events to return per query

Tools and capabilities

ctx_execute: sandboxed code execution

Run arbitrary code to compute over data without inflating context:

// Example: Find all TODOs in a project
const files = await ctx.glob('**/*.js');
const todos = [];
for (const f of files) {
  const lines = ctx.readFile(f).split('\n');
  lines.forEach((line, i) => {
    if (line.includes('TODO')) {
      todos.push({ file: f, line: i+1, text: line.trim() });
    }
  });
}
return todos;

Returns: Array of TODO objects (~2KB) instead of all file contents (~500KB).

Language support: JavaScript/Node.js by default. Python support via configuration.

ctx_batch: parallel operations

Execute multiple independent operations concurrently:

await ctx.batch([
  { op: 'read', file: 'package.json' },
  { op: 'read', file: 'tsconfig.json' },
  { op: 'exec', cmd: 'git status --short' }
]);

Returns combined results. Reduces wall-clock time and token overhead compared to sequential operations.

ctx_fetch: HTTP requests without browser

Lightweight HTTP client for API calls:

const data = await ctx.fetch('https://api.github.com/repos/user/repo', {
  headers: { 'Authorization': 'token ghp_...' }
});

Cheaper and faster than launching a browser for JSON APIs. Results are stored with deduplication—identical requests within 5 minutes return cached data.

ctx_index: semantic search over codebase

Build searchable indexes of code:

await ctx.index('src/**/*.ts', { type: 'code', language: 'typescript' });
const results = await ctx.search('authentication logic');

Uses embeddings for semantic search (not just grep). Finds relevant code even when search terms don't match exactly.

Implementation: Indexes are built incrementally (only new/changed files), stored in SQLite with vector extensions (sqlite-vss or similar).

ctx_query: SQL over session history

Direct SQL access to event log:

SELECT summary, timestamp
FROM events
WHERE type = 'error'
  AND timestamp > datetime('now', '-1 hour')
ORDER BY timestamp DESC;

Useful for debugging agent behavior, generating reports, or building custom dashboards.


Performance benchmarks (vendor claims)

The README includes benchmark comparisons:

Context usage reduction

ScenarioWithout context-modeWith context-modeReduction
Read 50 files (500 lines each)375,000 tokens12,000 tokens97%
Scrape 10 web pages250,000 tokens8,000 tokens97%
Analyze git history (100 commits)180,000 tokens6,000 tokens97%
Debug session (200 turns)850,000 tokens45,000 tokens95%

Methodology: Benchmarks measure total tokens sent to LLM across full session. "Without" uses naive tool output inclusion. "With" uses context-mode sandboxing and retrieval.

Latency impact

OperationOverheadNotes
SQLite insert (event)~1msPer tool call
FTS5 query~5-20msDepends on DB size
Sandbox execution~50-200msCold start penalty
Retrieval (10 results)~10-30msIncluding ranking

Net impact: Adds 50-250ms per tool call. On multi-second LLM inference, overhead is negligible (2-5% of total latency).

Storage scaling

Session lengthEventsDB sizeQuery time (p95)
1 hour50015 MB8 ms
8 hours4,000120 MB15 ms
40 hours20,000600 MB35 ms

Database stays performant up to hundreds of MB. Auto-compaction (configured retention) keeps growth bounded.


Related on ExplainX

Sources


ELv2 terms, platform support matrices, and benchmark numbers change. Treat this as May 26, 2026 context from the public README—not legal or security review.

Related posts