What is the Caveman skill?

The Caveman skill is an open-source agent skill (JuliusBrussee/caveman) that constrains how much surface prose assistants emit—modes include lite, full, and ultra—while preserving technical payloads like code blocks. It is listed on explainx.ai at explainx.ai/skills/JuliusBrussee/caveman/caveman and targets communication overhead in Claude Code-style workflows, not model reasoning depth.

What drives LLM API cost in production in 2026?

Per official OpenAI and Anthropic pricing pages checked April 2026, billable cost is usually input tokens plus output tokens (output often priced higher per token), plus discounted cached input where supported, optional batch or flex tiers, tool fees such as web search ($10 per 1,000 calls on OpenAI), and regional or data-residency uplifts (e.g. +10% on OpenAI for models released after March 5, 2026 when using certain regional endpoints).

Why does verbose model output still matter if models got cheaper?

Lower per-million-token rates reduce the unit cost of waste but do not remove chained amplification: each agent step forwards prior completions as context, so unnecessary prose inflates later input buckets, slows review, and increases contradiction risk. Azizul Hakim (2026, arXiv:2604.00025) links spontaneous verbosity in large models to measurable accuracy losses on a subset of benchmarks; constraining brevity improved large-model accuracy by about 26 percentage points in their causal interventions.

When should you avoid universal brevity?

Skip or scope terse defaults when compliance needs explicit narrative, onboarding requires pedagogical depth, or audit trails must live inside the model reply itself. In those cases, route verbose output to specific steps or users instead of defaulting every call to minimal style.

← Back to blog

explainx / blog

Caveman skill: token economics, API pricing, and cutting verbose LLM output in agents

Caveman agent skill for terse Claude and GPT replies: 2026 OpenAI and Anthropic pricing, why output tokens dominate agent bills, and how the JuliusBrussee/caveman skill pairs with caching and routing.

Apr 9, 2026·7 min read·Yash Thakker

Caveman skillLLM OptimizationToken EconomicsDeveloper ToolingPrompting

Caveman skill: token economics, API pricing, and cutting verbose LLM output in agents

What is the Caveman skill?

Why this post exists

Related posts

Prompt Caching: Decision Framework for LLM Cost, Latency, and Security (2026)

Why Every AI Company Wants You Using Agents: The Token Economics Nobody Talks About

Microsoft SkillOpt: The Self-Evolving Agent That Trains Documents, Not Models (52/52 Wins)

First principles: what you are actually paying for

Token cost history: anchors that still matter

Why verbosity still hurts after price drops

Tokenization: why “word count” misleads

Research note: brevity as an intervention, not just aesthetics

Where Caveman fits

Cost math: a sanity model

Platform mechanics teams overlook

Deployment playbook

Failure modes

Caveman as pattern, not meme

FAQ (same topics as structured data above)

Sources

What is the Caveman skill?

Why this post exists

Related posts

Prompt Caching: Decision Framework for LLM Cost, Latency, and Security (2026)

Why Every AI Company Wants You Using Agents: The Token Economics Nobody Talks About

Microsoft SkillOpt: The Self-Evolving Agent That Trains Documents, Not Models (52/52 Wins)

First principles: what you are actually paying for

Token cost history: anchors that still matter

Why verbosity still hurts after price drops

Tokenization: why “word count” misleads

Research note: brevity as an intervention, not just aesthetics

Where Caveman fits

Cost math: a sanity model

Platform mechanics teams overlook

Deployment playbook

Failure modes

Caveman as pattern, not meme

FAQ (same topics as structured data above)

Related links

Sources