What Is pxpipe? Cut Claude Code Token Bills by Rendering Context as Images
pxpipe is a local proxy that turns bulky Claude Code context โ system prompt, tool docs, history โ into PNG pages for ~59โ70% lower Fable bills. How it works, benchmarks, lossy caveats, and when not to use it.
What if your system prompt cost 2,700 tokens instead of 25,000 โ and the model still read every line?pxpipe (pxpipe-proxy on npm) is a local proxy that does exactly that: it intercepts Claude Code requests, reflows bulky text context into dense PNG pages, and forwards the rest unchanged. Same session. Same tools. Fraction of the input tokens.
The repo hit 2.2k GitHub stars with v0.8.0 shipping July 2026. The headline claim: ~59โ70% lower end-to-end bills on Fable 5 list pricing โ but the authors are explicit that workloads differ and the durable number is measured token cut per request, not marketing math.
TL;DR โ what developers ask first
Question
Answer
What is it?
Local HTTP proxy โ text โ PNG for bulky context
Run how?
npx pxpipe-proxy โ ANTHROPIC_BASE_URL=http://127.0.0.1:47821 claude
~59โ70% end-to-end on measured Fable traffic (varies)
Lossy?
Yes โ hex/IDs not byte-safe in images
Output touched?
No โ compresses request only
License
MIT ยท TypeScript ยท Node + Cloudflare Workers
The insight โ pixels, not characters
LLM providers bill text tokens by tokenizer chunk count. Vision billing is different: an image costs tokens from resolution, not how many characters you painted into the PNG.
pxpipe's README states the gap on real Claude Code traffic:
Mode
Dense content efficiency
Text tokens
~1 char per billed token
Image tokens (dense render)
~3.1 chars per vision token
A 1928ร1928 page โ 4,761 vision tokens and holds โ 92,000 characters of wrapped monospace text. Text only wins above ~19 chars per text-token โ Claude Code sessions average ~1.91 (N=391 production rows), so imaging wins often.
Concrete example from the repo: ~48k characters of system prompt + tool docs โ โ25k text tokens as plain text, โ2.7k image tokens as one 1573ร1248 page. That is the visual the model sees โ instruction banner on top, โต marking original newlines.
If you are new to how providers count units, start with what are LLM tokens โ pxpipe games the input side of that ledger.
Try it in 30 seconds
bash
npx pxpipe-proxy
bash
ANTHROPIC_BASE_URL=http://127.0.0.1:47821 claude
Open http://127.0.0.1:47821/ for:
Tokens saved (running total)
Every text โ image conversion side by side
Kill switch to pass through byte-identical
Live model allowlist chips
Responses stream normally โ pxpipe never compresses the model's reply. Recent conversation turns stay text; static system prompt, tool docs, and older collapsed history are the usual imaging targets.
What actually gets compressed
Three buckets, each behind a profitability gate (sparse prose stays text):
Turns behind the live tail โ image pages; recent turns always text
System prompt + tool docs slab
Static prefix reflowed into page(s), cache-friendly splice
Passes through unchanged: your latest messages, small blocks, sparse prose, models outside allowlist, and all model output.
GPT path note: tool definitions stay native JSON; no Anthropic cache_control markers on OpenAI transforms.
Set PXPIPE_MODELS=off to disable imaging entirely. Default: claude-fable-5,gpt-5.6.
How the pipeline works
snippet
tool_result string
โ wrap at 1928px-wide columns
โ pack ~92,000 chars/page
โ PNG[]
โ splice into /v1/messages (cache-friendly)
โ forward to Anthropic
Events log to ~/.pxpipe/events.jsonl. Each row records:
Counterfactual โ free count_tokens on the original uncompressed body (parallel probe)
Actual โ billed usage from the real response
That is how the README avoids inflated "savings on the slice we touched" โ end-to-end denominator includes requests pxpipe correctly left alone, cache reads/writes, and all output tokens.
Dollar math in docs uses Fable 5 list ratios (input ร1.0, cache write ร1.25, cache read ร0.1, output ร5) applied identically to both sides so cache discounts cancel.
Confabulation on never-stated facts (lower better), Fable 5
16/arm
0/16
0/16
โ
Verbatim 12-char hex, dense render, Opus
15
15/15
0/15
โ
Verbatim 12-char hex, dense render, Fable 5
15
โ
13/15
โ
SWE-bench Lite pilot: 10/10 both arms at โ65% request size.
SWE-bench Pro: 14/19 ON vs 15/19 OFF at โ60%; verdicts agree 18/19; single split re-resolved 3/3 on replication (variance, not compression failure per FINDINGS.md).
Receipts: eval/swe-bench/, eval/swe-bench-pro/, eval/needle-haystack/, eval/gist-recall/ on GitHub.
The honest limitations
pxpipe's README leads with failure modes โ unusual and worth copying:
Lossy by design
Exact strings in images are not byte-safe โ SHAs, API keys, precise counts
Documented real-world miss: model recalled a person's name from imaged chat history wrong, confidently
Coding tolerates this because agents re-read files; pure chat recall does not
Escape hatches
Route byte-exact work to subagents on non-allowlisted models (CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4-6)
keepSharp(block) in library API pins blocks as text
Recent turns never imaged
Other costs
PNG encoding latency on large requests before forward
ASCII/Latin-1 well tested; CJK works conservatively
No dedicated verbatim-risk guard shipped yet (roadmap item)
Workload-dependent economics
Wins on token-dense content (~1 char/token). Can lose money on sparse prose (~3.5 chars/token). Profitability gate calibrated on N=391 production rows.
options.keepSharp(block) forces text; options.emitRecoverable returns originals of imaged blocks. Pure JS runtime (Node + edge Workers); @napi-rs/canvas is build-time only.
pxpipe vs /compact vs smaller context windows
Approach
Mechanism
Trade-off
/compact
Summarize + drop history
Fast, but deletes detail
Smaller model
Cheaper per token
Less capability
pxpipe
Full content as images
Lossy on exact strings; PNG CPU cost
Shorter prompts
Human discipline
Free, but fights agent harness growth
After Meta's 73.7T token month and Tesla's $200/week caps, input compression is no longer a hobby project โ but end-to-end measurement matters. pxpipe's events.jsonl counterfactual pattern is the right audit habit.
"Why does the README read like an AI wrote it? Because one did. Most of this repo's commits โ the code and the docs โ were authored by Opus/Fable agent sessions running behind pxpipe itself, reading their own collapsed history as image pages while they worked."
That is either the best dogfooding story of 2026 or the most recursive disclaimer โ possibly both.
Roadmap (hypotheses, not promises)
Sharper glyph rendering (eval/glyph-matrix/)
Whether imaged bulk stretches effective context (~2ร content in same 1M window)
Whether smaller active context improves long-task accuracy
Each ships with an n or gets cut, per maintainers.
FAQ โ quick answers
Does pxpipe work with Cursor / other Claude clients?
Anything that respects ANTHROPIC_BASE_URL can route through it โ Claude Code is the documented path.
Is this cheating prompt cache?
Repo claims cache-friendly splice โ static prefix preserved so Anthropic prompt caching keeps working. Verify on your traffic via events.jsonl.
Should enterprises run this?
Pilot on non-production repos first. Lossy compression + silent hex misses is a governance conversation โ especially for regulated IDs and audit trails. See enterprise token governance.
Stars / releases?
~2.2k stars, 135 forks, v0.8.0 (July 2026), MIT license, teamchong/pxpipe.
pxpipe v0.8.0 as documented July 2026. Savings figures from project FINDINGS.md and README โ re-derive on your own ~/.pxpipe/events.jsonl before budget decisions.