pxpipe (npm package pxpipe-proxy, GitHub teamchong/pxpipe) is a local HTTP proxy that intercepts Anthropic /v1/messages requests and rewrites bulky text blocks — system prompt, tool definitions, large tool_result bodies, and older chat history — into compact PNG image pages before forwarding to the API. Recent turns stay text; the model's output is never modified.

How much does pxpipe save on Claude Code bills?

On measured production traffic at Fable 5 list pricing, pxpipe reports roughly 59% end-to-end savings on a 13,709-request snapshot ($100 → ~$41) and ~70% on a later 8,904-compressed-request trace. Savings are workload-dependent: dense code and JSON win; sparse prose can lose money. The durable metric is input token reduction per request, logged in ~/.pxpipe/events.jsonl with a parallel count_tokens counterfactual.

Why does rendering text as images reduce tokens?

Vision billing is tied to image pixel dimensions, not characters embedded in the PNG. A 1928×1928 page costs about 4,761 vision tokens and can hold roughly 92,000 characters of wrapped text (~3.1 chars per image-token on dense Claude Code traffic vs ~1 char per text-token). When content is token-dense, imaging beats raw text above roughly 19 chars per text-token.

Is pxpipe lossy? What should stay as text?

Yes. Exact 12-character hex strings in dense renders: 13/15 correct on Fable 5, 0/15 on Opus — misses are silent confabulations, not errors. IDs, hashes, secrets, and recent turns should stay text. Opus 4.7/4.8 misread about 7% of renders; GPT 5.5 degrades on imaged context — both are opt-in via PXPIPE_MODELS. Default allowlist is claude-fable-5 and gpt-5.6 only.

How do I run pxpipe with Claude Code?

Run npx pxpipe-proxy (listens on 127.0.0.1:47821 by default), then start Claude Code with ANTHROPIC_BASE_URL=http://127.0.0.1:47821. A dashboard at http://127.0.0.1:47821/ shows tokens saved, text-to-image conversions, kill switch, and model chips. Set PXPIPE_MODELS=off to disable imaging.

How is pxpipe different from /compact or context pruning?

/compact summarizes and deletes history — lossy by design and harder to audit. pxpipe preserves full content visually on image pages while cutting billed tokens, keeps prompt-cache-friendly static prefixes, and logs exact before/after token counts per request. FINDINGS.md on the repo argues /compact is the correct baseline comparison, not verbatim replay.

pxpipe: Cut Claude Code Tokens via Image Context (2026) | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

pxpipe: Cut Claude Code Tokens via Image Context (2026) | explainx.ai Blog | explainx.ai

What if your system prompt cost 2,700 tokens instead of 25,000 — and the model still read every line? pxpipe (pxpipe-proxy on npm) is a local proxy that does exactly that: it intercepts Claude Code requests, reflows bulky text context into dense PNG pages, and forwards the rest unchanged. Same session. Same tools. Fraction of the input tokens.

The repo hit 2.2k GitHub stars with v0.8.0 shipping July 2026. The headline claim: ~59–70% lower end-to-end bills on Fable 5 list pricing — but the authors are explicit that workloads differ and the durable number is measured token cut per request, not marketing math.

TL;DR — what developers ask first

Question	Answer
What is it?	Local HTTP proxy — text → PNG for bulky context
Run how?	`npx pxpipe-proxy` → `ANTHROPIC_BASE_URL=http://127.0.0.1:47821 claude`
Dashboard	`http://127.0.0.1:47821/` — savings, conversions, kill switch
Default models	Fable 5 + GPT 5.6 only · Opus/GPT 5.5 opt-in
Savings	~59–70% end-to-end on measured Fable traffic (varies)
Lossy?	Yes — hex/IDs not byte-safe in images
Output touched?	No — compresses request only
License	MIT · TypeScript · Node + Cloudflare Workers

The insight — pixels, not characters

LLM providers bill text tokens by tokenizer chunk count. Vision billing is different: an image costs tokens from resolution, not how many characters you painted into the PNG.

pxpipe's README states the gap on real Claude Code traffic:

Mode	Dense content efficiency
Text tokens	~1 char per billed token
Image tokens (dense render)	~3.1 chars per vision token

A 1928×1928 page ≈ 4,761 vision tokens and holds ≈ 92,000 characters of wrapped monospace text. Text only wins above ~19 chars per text-token — Claude Code sessions average ~1.91 (N=391 production rows), so imaging wins often.

Concrete example from the repo: ~48k characters of system prompt + tool docs → ≈25k text tokens as plain text, ≈2.7k image tokens as one 1573×1248 page. That is the visual the model sees — instruction banner on top, ↵ marking original newlines.

If you are new to how providers count units, start with what are LLM tokens — pxpipe games the input side of that ledger.

Try it in 30 seconds

bash

npx pxpipe-proxy

bash

ANTHROPIC_BASE_URL=http://127.0.0.1:47821 claude

Open http://127.0.0.1:47821/ for:

Tokens saved (running total)
Every text → image conversion side by side
Kill switch to pass through byte-identical
Live model allowlist chips

Responses stream normally — pxpipe never compresses the model's reply. Recent conversation turns stay text; static system prompt, tool docs, and older collapsed history are the usual imaging targets.

What actually gets compressed

Three buckets, each behind a profitability gate (sparse prose stays text):

Bucket	Rule of thumb
Large `tool_result` bodies	File reads, logs, command output above ~6k chars of token-dense content
Older history	Turns behind the live tail → image pages; recent turns always text
System prompt + tool docs slab	Static prefix reflowed into page(s), cache-friendly splice

Passes through unchanged: your latest messages, small blocks, sparse prose, models outside allowlist, and all model output.

GPT path note: tool definitions stay native JSON; no Anthropic cache_control markers on OpenAI transforms.

Set PXPIPE_MODELS=off to disable imaging entirely. Default: claude-fable-5,gpt-5.6.

How the pipeline works

snippet

tool_result string
    → wrap at 1928px-wide columns
    → pack ~92,000 chars/page
    → PNG[]
    → splice into /v1/messages (cache-friendly)
    → forward to Anthropic

Events log to ~/.pxpipe/events.jsonl. Each row records:

Counterfactual — free count_tokens on the original uncompressed body (parallel probe)
Actual — billed usage from the real response

That is how the README avoids inflated "savings on the slice we touched" — end-to-end denominator includes requests pxpipe correctly left alone, cache reads/writes, and all output tokens.

Dollar math in docs uses Fable 5 list ratios (input ×1.0, cache write ×1.25, cache read ×0.1, output ×5) applied identically to both sides so cache discounts cancel.

Demo results — Fable vs Opus

Fable 5 (default, 100/100 reader)

Side-by-side A/B demo (plain left, pxpipe right):

Token count 10/10 across 39 imaged filler files (grep line-for-line match)
Multi-step ledger arithmetic correct
Session end: $6.06 with 73.5k/1M context left vs $42.21 at 96% full on plain text
Caveat: pxpipe arm needed a nudge to match requested one-line output format

Opus 4.8 (disabled by default)

Text needles read fine on both arms
Imaged phrase-count misread on Opus — pxpipe reports failure instead of fabricating
Why Opus is opt-in only

Benchmarks (from repo — reproducible evals)

Test	N	Text arm	pxpipe (image)	Token Δ
Novel arithmetic, Fable 5	100	100%	100%	−38%
Novel arithmetic, Opus 4.8	100	100%	93%	−38%
Gist recall A/B (15k–45k char sessions), Fable 5	98/arm	98/98	98/98	—
State tracking (mutations), Fable 5	18/arm	18/18	18/18	—
Confabulation on never-stated facts (lower better), Fable 5	16/arm	0/16	0/16	—
Verbatim 12-char hex, dense render, Opus	15	15/15	0/15	—
Verbatim 12-char hex, dense render, Fable 5	15	—	13/15	—

SWE-bench Lite pilot: 10/10 both arms at −65% request size.

SWE-bench Pro: 14/19 ON vs 15/19 OFF at −60%; verdicts agree 18/19; single split re-resolved 3/3 on replication (variance, not compression failure per FINDINGS.md).

Receipts: eval/swe-bench/, eval/swe-bench-pro/, eval/needle-haystack/, eval/gist-recall/ on GitHub.

The honest limitations

pxpipe's README leads with failure modes — unusual and worth copying:

Lossy by design

Exact strings in images are not byte-safe — SHAs, API keys, precise counts
Documented real-world miss: model recalled a person's name from imaged chat history wrong, confidently
Coding tolerates this because agents re-read files; pure chat recall does not

Escape hatches

Route byte-exact work to subagents on non-allowlisted models (CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4-6)
keepSharp(block) in library API pins blocks as text
Recent turns never imaged

Other costs

PNG encoding latency on large requests before forward
ASCII/Latin-1 well tested; CJK works conservatively
No dedicated verbatim-risk guard shipped yet (roadmap item)

Workload-dependent economics

Wins on token-dense content (~1 char/token). Can lose money on sparse prose (~3.5 chars/token). Profitability gate calibrated on N=391 production rows.

Library use — no proxy required

typescript

import { renderTextToImages, transformAnthropicMessages } from "pxpipe-proxy";

const { pages } = await renderTextToImages(toolResultText);
// pages[i].png: Uint8Array

const { body, applied, info } = await transformAnthropicMessages({
  body: requestBytes,
  model: "claude-fable-5",
});

options.keepSharp(block) forces text; options.emitRecoverable returns originals of imaged blocks. Pure JS runtime (Node + edge Workers); @napi-rs/canvas is build-time only.

pxpipe vs `/compact` vs smaller context windows

Approach	Mechanism	Trade-off
`/compact`	Summarize + drop history	Fast, but deletes detail
Smaller model	Cheaper per token	Less capability
pxpipe	Full content as images	Lossy on exact strings; PNG CPU cost
Shorter prompts	Human discipline	Free, but fights agent harness growth

After Meta's 73.7T token month and Tesla's $200/week caps, input compression is no longer a hobby project — but end-to-end measurement matters. pxpipe's events.jsonl counterfactual pattern is the right audit habit.

Pair with Claude usage limits timeline so you know when burn is plan-gated vs dollar-gated.

Meta note — agents documenting themselves

From the README:

"Why does the README read like an AI wrote it? Because one did. Most of this repo's commits — the code and the docs — were authored by Opus/Fable agent sessions running behind pxpipe itself, reading their own collapsed history as image pages while they worked."

That is either the best dogfooding story of 2026 or the most recursive disclaimer — possibly both.

Roadmap (hypotheses, not promises)

Sharper glyph rendering (eval/glyph-matrix/)
Whether imaged bulk stretches effective context (~2× content in same 1M window)
Whether smaller active context improves long-task accuracy

Each ships with an n or gets cut, per maintainers.

FAQ — quick answers

Does pxpipe work with Cursor / other Claude clients?

Anything that respects ANTHROPIC_BASE_URL can route through it — Claude Code is the documented path.

Is this cheating prompt cache?

Repo claims cache-friendly splice — static prefix preserved so Anthropic prompt caching keeps working. Verify on your traffic via events.jsonl.

Should enterprises run this?

Pilot on non-production repos first. Lossy compression + silent hex misses is a governance conversation — especially for regulated IDs and audit trails. See enterprise token governance.

Stars / releases?

~2.2k stars, 135 forks, v0.8.0 (July 2026), MIT license, teamchong/pxpipe.

What Is pxpipe? Cut Claude Code Token Bills by Rendering Context as Images

Related posts

The Map Is Not the Territory: Finding Your Unknowns with Claude Fable 5

Claude Usage Limits in 2026: Every Change Explained (Timeline)

What Is Claude Code /radio? Claude FM, Lo-Fi, and the Pixel Art Easter Egg

TL;DR — what developers ask first

The insight — pixels, not characters

Try it in 30 seconds

What actually gets compressed

How the pipeline works

Demo results — Fable vs Opus

Fable 5 (default, 100/100 reader)

Opus 4.8 (disabled by default)

Benchmarks (from repo — reproducible evals)

The honest limitations

Lossy by design

Escape hatches

Other costs

Workload-dependent economics

Library use — no proxy required

pxpipe vs `/compact` vs smaller context windows

Meta note — agents documenting themselves

Roadmap (hypotheses, not promises)

FAQ — quick answers

Related Reading

Related posts

The Map Is Not the Territory: Finding Your Unknowns with Claude Fable 5

Claude Usage Limits in 2026: Every Change Explained (Timeline)

What Is Claude Code /radio? Claude FM, Lo-Fi, and the Pixel Art Easter Egg

TL;DR — what developers ask first

The insight — pixels, not characters

Try it in 30 seconds

What actually gets compressed

How the pipeline works

Demo results — Fable vs Opus

Fable 5 (default, 100/100 reader)

Opus 4.8 (disabled by default)

Benchmarks (from repo — reproducible evals)

The honest limitations

Lossy by design

Escape hatches

Other costs

Workload-dependent economics

Library use — no proxy required

pxpipe vs /compact vs smaller context windows

Meta note — agents documenting themselves

Roadmap (hypotheses, not promises)

FAQ — quick answers

Related Reading

pxpipe vs `/compact` vs smaller context windows