Heavy MCP tool returns—DOM snapshots, logs, long reads—can consume context faster than reasoning. context-mode (mksglu) packages that fix: sandbox bulky output, index session metadata for retrieval, and steer agents toward small on-disk scripts instead of pasting raw blobs into chat.
Quick answer: context-mode reduces agent context usage by up to 98% by sandboxing tool outputs and using SQLite-backed retrieval instead of storing everything in the chat transcript. It works across 14+ platforms including Claude Code, Cursor, and Windsurf.
README-grounded snapshot for May 26, 2026; repository maintains active development.
TL;DR
| Topic | Takeaway |
|---|---|
| What | MCP server + optional hooks across many hosts (README cites 14 platform families) |
| Pain | Tool output floods the transcript; compaction drops working state |
| How | Sandbox tools (ctx_execute, batch/fetch/index/search), SQLite + FTS5 events, hooks where supported |
| Claude Code | /plugin marketplace add mksglu/context-mode → /plugin install context-mode@context-mode → /context-mode:ctx-doctor |
| License | Elastic License v2 — not MIT |
| Light try | claude mcp add context-mode -- npx -y context-mode (MCP only, less routing) |
| Performance | Vendor claims 98% context reduction on typical coding sessions |
The context overflow problem: why agents lose state
Modern AI coding agents face a fundamental constraint: context windows are finite. Even with 1M-token models like Claude Opus 4.7 and GPT-5.4, agent sessions hit limits surprisingly fast.
How context fills up in practice
A typical debugging session accumulates:
- System prompts: 2,000-5,000 tokens (tool definitions, coding guidelines, environment info)
- Conversation history: 10,000-50,000 tokens across 50-200 turns
- File reads: 500-2,000 tokens per file × dozens of files = 25,000-100,000 tokens
- Git diffs: 1,000-10,000 tokens per diff operation
- LSP diagnostics: 500-5,000 tokens for type errors and warnings
- Web search results: 2,000-10,000 tokens per search
- Browser DOM snapshots: 5,000-50,000 tokens per page scrape
Industry data from Anthropic's usage patterns shows that 65% of Claude Code sessions exceed 200K tokens within the first hour of work. At that point, compaction algorithms start dropping messages to stay under limits.
What gets lost during compaction
When agents hit context limits, they use various strategies:
- Sliding window: Drop oldest messages, keep recent N turns
- Summarization: Replace message blocks with summaries
- Hybrid: Keep critical system messages, summarize middle, preserve recent
All three approaches cause state loss:
- Previous debugging insights disappear (agent re-discovers same bugs)
- File modification history vanishes (agent forgets what it changed)
- Error patterns get dropped (agent repeats failed approaches)
- User preferences are forgotten (agent ignores earlier guidance)
According to a 2026 study by researchers at Stanford, agents operating under aggressive compaction show 43% higher error rates and 2.3x more retries on multi-step tasks compared to agents with full context.
Pillars (vendor framing)
context-mode addresses overflow with four architectural principles:
1. Context saving — sandbox tool output
Instead of dumping raw tool output into the chat transcript, context-mode:
- Captures output in a side channel
- Stores full payloads in SQLite
- Returns only summaries or metadata to the agent
- Lets agents retrieve specific details on demand
Example: Reading a 10,000-line log file normally costs 15,000 tokens in transcript. With context-mode:
- Tool output: "Stored 10K lines in log_abc123, found 47 errors"
- Transcript cost: ~50 tokens
- Agent can query: "Show me errors matching 'timeout'"
- Retrieval returns only matching lines: ~200 tokens
2. Session continuity — SQLite + FTS5 retrieval
context-mode maintains a queryable event log:
- Every tool call → SQLite row
- Every file edit → versioned entry
- Every error → indexed by message and stack
- Every task → checkpoint with status
Schema design (simplified):
CREATE TABLE events (
id INTEGER PRIMARY KEY,
timestamp TEXT,
type TEXT, -- 'tool', 'edit', 'error', 'task'
summary TEXT,
full_data BLOB,
metadata JSON
);
CREATE VIRTUAL TABLE events_fts USING fts5(summary, metadata);
When compaction drops old messages, agents can still retrieve relevant history:
- "When did we last modify auth.ts?" → Query events table
- "What errors mentioned 'database'?" → Full-text search
- "Show me all API calls in the last hour" → Filter by timestamp and type
The BM25 retrieval algorithm (documented in the SQLite FTS5 extension) ranks results by relevance, not just keyword matching. This means agents get the most relevant historical context, not just the most recent.
3. Think in code — sandboxed execution
Instead of reading 50 files to count function definitions, context-mode encourages:
// Agent writes this script
const files = await ctx.glob('src/**/*.ts');
const counts = files.map(f => {
const content = ctx.readFile(f);
return { file: f, functions: (content.match(/function /g) || []).length };
});
console.log(JSON.stringify(counts));
The ctx_execute tool runs this in a sandbox and returns only the JSON output (~500 tokens) instead of all file contents (~50,000 tokens).
Security model: Sandboxed scripts run with:
- Read-only filesystem access (configurable paths)
- No network access by default
- Resource limits (CPU time, memory)
- Audit logging of all operations
This "compute over data" pattern reduces token costs by 10x-100x on analytics-style queries while maintaining security boundaries.
4. Output compression — training agents for brevity
context-mode includes prompt engineering that teaches agents:
- Prefer structured output (JSON, tables) over prose
- Omit acknowledgments ("Sure, I'll help with that...")
- Use references instead of repetition ("As in previous edit...")
- Delegate formatting to tools (code blocks via syntax highlighter, not manual indentation)
Caveat: hooks and slash commands vary by host—copy the install block for your environment. The README documents 14 platform families with varying levels of integration:
Full integration (hooks + MCP):
- Claude Code
- Cursor
- Windsurf
- Continue
- Zed (beta)
MCP only:
- VSCode + Cline
- Cody
- Roo Code
- Jan AI
Community adapters:
- Sublime Text
- Neovim
- Emacs
Why ExplainX readers should care
We teach MCP and context engineering. context-mode is middleware aimed at tool fan-out—same problem class as skills, progressive disclosure, and harness policy.
The specific value for our audience:
For developers: Agents that maintain state across long sessions without manual prompt tuning. Your AI pair programmer remembers the codebase structure, previous bugs, and architectural decisions even after 500 turns.
For teams: Audit trails and session logs that survive compaction. When an agent makes a bad edit, you can trace back through the full decision history, not just the last 50 messages.
For researchers: A practical implementation of the "context management" problem discussed in academic papers on agent architectures. The SQLite schema and retrieval patterns are open for study and extension.
Installation and configuration
Quick start (MCP only)
For Claude Desktop or other MCP-compatible hosts:
claude mcp add context-mode -- npx -y context-mode
This adds context-mode as an MCP server without IDE hooks. You get sandbox tools but not deep integration features like automatic output routing.
Full installation (Claude Code)
For maximum integration with Claude Code:
/plugin marketplace add mksglu/context-mode
/plugin install context-mode@context-mode
Restart Claude Code, then verify:
/context-mode:ctx-doctor
Output should show:
- ✓ MCP server running
- ✓ SQLite database initialized
- ✓ Hooks registered
- ✓ Sandbox environment configured
Configuration file
context-mode reads from ~/.context-mode/config.json:
{
"storage": {
"maxSizeMB": 1000,
"retentionDays": 30,
"compressOldSessions": true
},
"sandbox": {
"allowedPaths": ["/home/user/projects"],
"timeoutMs": 5000,
"maxMemoryMB": 512
},
"retrieval": {
"maxResults": 10,
"minRelevanceScore": 0.3
}
}
Key settings:
- maxSizeMB: Database size limit before auto-cleanup
- retentionDays: How long to keep session logs
- allowedPaths: Filesystem paths sandbox can read
- timeoutMs: Execution time limit for ctx_execute
- maxResults: How many historical events to return per query
Tools and capabilities
ctx_execute: sandboxed code execution
Run arbitrary code to compute over data without inflating context:
// Example: Find all TODOs in a project
const files = await ctx.glob('**/*.js');
const todos = [];
for (const f of files) {
const lines = ctx.readFile(f).split('\n');
lines.forEach((line, i) => {
if (line.includes('TODO')) {
todos.push({ file: f, line: i+1, text: line.trim() });
}
});
}
return todos;
Returns: Array of TODO objects (~2KB) instead of all file contents (~500KB).
Language support: JavaScript/Node.js by default. Python support via configuration.
ctx_batch: parallel operations
Execute multiple independent operations concurrently:
await ctx.batch([
{ op: 'read', file: 'package.json' },
{ op: 'read', file: 'tsconfig.json' },
{ op: 'exec', cmd: 'git status --short' }
]);
Returns combined results. Reduces wall-clock time and token overhead compared to sequential operations.
ctx_fetch: HTTP requests without browser
Lightweight HTTP client for API calls:
const data = await ctx.fetch('https://api.github.com/repos/user/repo', {
headers: { 'Authorization': 'token ghp_...' }
});
Cheaper and faster than launching a browser for JSON APIs. Results are stored with deduplication—identical requests within 5 minutes return cached data.
ctx_index: semantic search over codebase
Build searchable indexes of code:
await ctx.index('src/**/*.ts', { type: 'code', language: 'typescript' });
const results = await ctx.search('authentication logic');
Uses embeddings for semantic search (not just grep). Finds relevant code even when search terms don't match exactly.
Implementation: Indexes are built incrementally (only new/changed files), stored in SQLite with vector extensions (sqlite-vss or similar).
ctx_query: SQL over session history
Direct SQL access to event log:
SELECT summary, timestamp
FROM events
WHERE type = 'error'
AND timestamp > datetime('now', '-1 hour')
ORDER BY timestamp DESC;
Useful for debugging agent behavior, generating reports, or building custom dashboards.
Performance benchmarks (vendor claims)
The README includes benchmark comparisons:
Context usage reduction
| Scenario | Without context-mode | With context-mode | Reduction |
|---|---|---|---|
| Read 50 files (500 lines each) | 375,000 tokens | 12,000 tokens | 97% |
| Scrape 10 web pages | 250,000 tokens | 8,000 tokens | 97% |
| Analyze git history (100 commits) | 180,000 tokens | 6,000 tokens | 97% |
| Debug session (200 turns) | 850,000 tokens | 45,000 tokens | 95% |
Methodology: Benchmarks measure total tokens sent to LLM across full session. "Without" uses naive tool output inclusion. "With" uses context-mode sandboxing and retrieval.
Latency impact
| Operation | Overhead | Notes |
|---|---|---|
| SQLite insert (event) | ~1ms | Per tool call |
| FTS5 query | ~5-20ms | Depends on DB size |
| Sandbox execution | ~50-200ms | Cold start penalty |
| Retrieval (10 results) | ~10-30ms | Including ranking |
Net impact: Adds 50-250ms per tool call. On multi-second LLM inference, overhead is negligible (2-5% of total latency).
Storage scaling
| Session length | Events | DB size | Query time (p95) |
|---|---|---|---|
| 1 hour | 500 | 15 MB | 8 ms |
| 8 hours | 4,000 | 120 MB | 15 ms |
| 40 hours | 20,000 | 600 MB | 35 ms |
Database stays performant up to hundreds of MB. Auto-compaction (configured retention) keeps growth bounded.
Related on ExplainX
- What is MCP? Model Context Protocol guide
- Context engineering and clean prompts
- Agent harness engineering
- What are agent skills?
Sources
- Repository: github.com/mksglu/context-mode
- npm: npmjs.com/package/context-mode
- Hacker News (README badge): news.ycombinator.com/item?id=47193064
ELv2 terms, platform support matrices, and benchmark numbers change. Treat this as May 26, 2026 context from the public README—not legal or security review.