When OpenAI launched Codex, they did not just ship an AI coding tool. They shipped an agent harness that can navigate web browsers, operate desktop apps, connect to GitHub, deploy to Vercel, and schedule its own recurring tasks.
When Anthropic shipped Claude Code, they did not just give you a terminal assistant. They gave you /loop, /goal, multi-day autonomous sessions, and a pattern they call loop engineering — where Claude plans a task, executes it, evaluates the output, self-corrects, and runs again, without you in the loop.
When every AI company builds agent frameworks, harness tooling, and computer-use capabilities, you might ask: why are they all going the same direction so fast?
The answer has two parts. The first part is genuinely true: agents complete tasks that simple chat cannot. Alex Finn launching a live landing page in 5 minutes is real. That is value a chat interface could not deliver.
The second part is also genuinely true, and gets discussed much less: agents burn tokens at a rate that transforms the business economics of AI companies.
Let's look at both parts honestly.
How AI Companies Actually Make Money
At the core, every major AI company has the same revenue model: they charge for compute, measured in tokens.
The delivery mechanism varies:
- Direct API pricing: $X per million input tokens, $Y per million output tokens
- Subscription tiers: Higher plans = more tokens per month before limits kick in
- Usage overages: Pay per token above your plan limit
- Enterprise contracts: Bulk token commitments
The abstraction layer (subscriptions, tiers, "unlimited" plans) obscures the underlying unit economics. But the unit is always tokens. More token consumption per user = more revenue per user.
This is not a secret. It is in every earnings call, every pricing page, every infrastructure investment announcement. The question is what it means for the products these companies build.
The Token Gap Between Chat and Agents
Here is the gap that makes agent products so attractive to build:
| Interaction type | Approximate token consumption |
|---|---|
| Simple chat question | 500 – 2,000 tokens |
| Complex research question | 2,000 – 10,000 tokens |
| Claude Code: fix a bug in one file | 5,000 – 20,000 tokens |
| Claude Code: multi-file refactor | 20,000 – 100,000 tokens |
Claude Code: /loop session, 30 minutes | 100,000 – 500,000 tokens |
| Codex: full computer-use workflow (code → GitHub → Vercel → live) | 200,000 – 1,000,000 tokens |
| Loop engineering: overnight autonomous coding session | 2,000,000 – 10,000,000 tokens |
The range is enormous. A power user running loop engineering overnight consumes roughly the same tokens as 5,000 people asking a simple chat question. From a revenue perspective, that is an extraordinary concentration of value in high-engagement agentic users.
This is why every company is building toward agents.
The Products Pushing You Toward More Tokens
Claude Code + Loop Engineering
Anthropic built /loop, /goal, and multi-day autonomous sessions specifically for the highest-token patterns possible. Loop engineering — the practice of designing agent workflows as self-correcting feedback loops — is, from a pure token-consumption perspective, the most expensive interaction pattern available.
The ExplainX blog on loop engineering for coding agents describes sessions that run for hours, burning through enormous context windows as Claude plans, executes, evaluates, and retries. It is genuinely useful. It also happens to be the thing that transforms a casual $20/month subscriber into a heavy API user with meaningful overage billing.
Codex + Computer Use
OpenAI's Codex wraps every computer-use interaction in observation loops: the model sees the screen, generates an action, observes the result, generates the next action. Each observation-action pair costs tokens. A workflow that navigates five web apps burns those tokens for every page render, every state observation, every decision.
Alex Finn's five-minute landing page launch was described as effortless. From a token consumption standpoint, "effortlessly" navigating GitHub, Vercel, and a domain registry through a vision model observing browser screenshots is extremely expensive. The pricing plan absorbed it. If it happened via API, the bill would be substantial.
Open Source Harnesses
The open-source ecosystem (LangChain, AutoGPT, CrewAI, various "open claw" and agent harness frameworks) replicates and amplifies the same patterns. Multi-agent loops where agents spawn sub-agents, evaluate each other's work, and iterate — all against paid API endpoints — can consume tokens at a rate that makes a single Claude Code session look modest.
These frameworks are not built by AI companies (mostly). But they run on AI company APIs. Every multi-agent harness session flows through API billing.
Why This Incentive Alignment Matters
The companies building these products are not being cynical. Agents genuinely deliver more value than chat for complex tasks. The incentive alignment — "more value = more tokens = more revenue" — is cleaner than it looks on the surface.
But it has a few downstream effects worth understanding:
1. Token efficiency is not a priority. When burning fewer tokens would mean worse economics, the pressure to optimize prompt efficiency is weak. You will notice that agent frameworks tend to be verbose — large system prompts, long observation strings, extensive context inclusion. Some of this is necessary. Some of it is the absence of incentive to trim it.
2. "Try the agent first" is not neutral advice. When the company building the agent also earns more when you use the agent, recommendations to "use Codex for every task you do on your computer" or "run this in a /loop" are not disinterested. The advice may be correct. But its source has aligned financial interests.
3. The subscription pricing blunts awareness. A chat message and a 6-hour loop session are both "using Claude Code." The subscription price is the same. The actual compute consumed differs by 1,000x. Flat-rate pricing makes heavy agentic usage feel free — until you hit limits, at which point you upgrade to a higher plan or pay overages. This is the intended ratchet.
4. Scheduling and automation are the ceiling. Alex Finn's step 6 — "for repeating tasks, schedule them in an automation" — is where the economics become most interesting. Scheduled recurring agent jobs that run daily or hourly, entirely unattended, are the equivalent of a server process that burns tokens continuously. This is not hypothetical. It is already being built by power users, and it represents the most predictable recurring revenue a token-pricing model can generate.
Is Agentic Work Actually Worth the Token Cost?
Yes, often. The math on "Alex Finn launched a landing page in 5 minutes" is favorable if his time is worth more than the API cost. The math on "loop engineering ran overnight and completed a migration that would have taken three days" is almost always favorable.
The question is not whether agents are worth using. It is whether every task you do on your computer is worth routing through an agent harness first — which is what the most aggressive advice suggests.
The answer for most tasks: probably not. Agents are 10x-100x more expensive per token than chat. For tasks where that cost is justified (complex multi-step workflows, repeated tasks, computer-use automations), the ROI is clear. For tasks where a chat response would do (quick questions, simple lookups, single-file edits), routing through an agent harness is burning money on overhead.
The heuristic: Use agents for tasks with multiple steps across multiple contexts. Use chat for tasks with a single clear output. Do not let "agents are more powerful" become "agents for everything" — that is expensive in both money and latency, and the company whose pricing you're on benefits whether or not the agent was the right tool.
What Loop Engineering Actually Is (And When It Makes Sense)
Loop engineering is the design practice of building agent workflows as self-correcting feedback loops:
- Plan: The agent generates a plan for a task
- Execute: The agent runs the plan step by step
- Evaluate: The agent checks its own output (tests pass? page renders? API returns 200?)
- Correct: If evaluation fails, the agent generates a fix and re-executes
- Repeat: Until the evaluation passes or a human intervention threshold is reached
This is genuinely powerful for tasks like:
- Automated test-driven development where the loop runs until all tests pass
- Migration scripts where the loop runs until the old and new data match
- Multi-file refactors where the loop runs until type-checking is clean
- Deployment pipelines where the loop runs until the service is healthy
It is overkill for:
- Writing a function when you know what the function should do
- Answering a question about your codebase
- Editing a config file
- Single-step tasks with clear, verifiable completion criteria
The AI companies love loop engineering because it is high-value and high-token. Use it when it earns its cost. Measure that cost.
A Practical Guide to Token-Aware AI Use
1. Size the task before picking the tool. Simple question → chat. Multi-step workflow → agent. Recurring automation → scheduled agent with careful prompt optimization to minimize per-run token cost.
2. Watch your usage dashboard. Every major AI platform shows token consumption. Know what your heavy-usage sessions cost, not just whether you're on a subscription.
3. Optimize prompts for agent loops. In a loop that runs 50 iterations, a 10% prompt reduction is 50% savings over the session. This matters for overnight runs.
4. Treat "use the agent for everything" advice with appropriate skepticism. It may be correct for your workflow. It is also what the company giving the advice earns more from.
5. Verify agentic outputs. Agents burn tokens producing work you should check. A loop that self-corrects 20 times on a wrong approach burns 20x the tokens of a loop that gets it right in one pass. Providing good initial context is as valuable as any loop design.
The Bigger Picture
AI companies are building toward a world where software tasks — all of them — route through AI agents. That world is probably coming. The productivity gains from having an agent that can operate your entire computer on a prompt are real.
But in that world, every task you delegate becomes a billable event. The economics of knowledge work shift from "how long did this take me?" to "how many tokens did this consume?" That is a new cost structure, and it has a new set of beneficiaries.
The companies building the agents are also the companies collecting the token fees. That is not a reason to avoid agents. It is a reason to use them deliberately — knowing what they cost, when they earn their cost, and when a simpler interaction would do the job just as well.