← Blog
explainx / blog

Loop Engineering: How to Design Coding Agent Loops That Run While You Sleep (2026 Guide)

Loop engineering: write programs that prompt coding agents on cron, with guardrails and skills inside. From ReAct and ralph to /goal and /loop in Claude Code.

15 min readYash Thakker
Loop EngineeringClaude CodeAI CodingAgent HarnessDeveloper Productivity

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Loop Engineering: How to Design Coding Agent Loops That Run While You Sleep (2026 Guide)

Loop engineering is the shift from typing prompts into a coding agent to writing the program that prompts the agent for you. This guide exists because of one tweet—and the thousand replies asking what it actually meant.


The tweet that triggered it

On June 8, 2026 at 12:28 AM, Peter Steinberger (@steipete)—creator of OpenClaw, now at OpenAI—posted two sentences that hit 6.5 million views on X:

Here's your monthly reminder that you shouldn't be prompting coding agents anymore.

You should be designing loops that prompt your agents.

That was it. No diagram. No repo link. The entire AI-coding timeline spent the next week arguing about six words.

What the replies actually asked

The thread surfaced three recurring questions:

ReplyWhoWhat they wanted
"Can you explain your workflow in detail? Would love a blog post about it"@MatthewBermanA concrete how-to
"how do we do that though?"@InderosDThe on-ramp
"wtf is a loop?"@MatthewBermanvideo explainerA definition

Berman's early reply captured the mood before the explainers landed: "nobody knows but him and boris."

The most useful reply in the thread came from @mosyaseen:

"designing the loop is half of it. the other half is putting something in the loop that can say no: a test, a type check, a real error. a loop with nothing to push back is the agent agreeing with itself on repeat."

Steinberger agreed—and pointed to VISION.md, a file he uses at the project level to anchor what agents should build toward (alongside agent rules in AGENTS.md and his broader agentic engineering workflow).

Skeptics pushed back too: @SaidAitmbarek flagged token efficiency; @jxnlco noted Steinberger's outsized reach on the topic. Fair—but the question underneath was real: if prompting is the old job, what is the new one?

Matt Van Horn (@mvanhorn)—who runs loops that open PRs across ~30 open-source repos overnight—ran /last30days research across Reddit, X, YouTube, and Hacker News and published a synthesis thread. Boris Cherny had already named the answer on stage four days earlier.

This post is the technical answer to "how do we do that though?" — what a loop is, where it came from, how to build one, and what production teams worry about once the tweet wears off.

TL;DR

QuestionAnswer
What triggered this?@steipete's June 8 tweet — 6.5M views, two sentences, one argument.
What is a loop?A program that prompts an agent, reads output, checks if done, repeats—or stops.
Who defines it cleanly?Boris Cherny, Claude Code creator, at WorkOS Acquired Unplugged (June 2, 2026).
Fastest on-ramp?Claude Code /loop — one slash command.
Old hat vs new?Single-agent ralph loops (2025) are baseline; multi-agent orchestration loops (2026) are the new layer.
What makes loops trustworthy?Self-verification: write → run → read result → correct.
What makes loops expensive?Not tokens per call—the loop management and runaway iterations.

The job moved up one altitude

Boris Cherny created Claude Code as a side project in September 2024. It now sits behind close to 4% of all public commits on GitHub, per industry reporting cited in the June 2026 discourse.

At WorkOS Acquired Unplugged on June 2, 2026, Boris gave the cleanest definition of what practitioners mean by "loop":

"Now it's actually leveled up, I think, again, to the next wave of abstraction where I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do. My job is to write loops."

Plain version:

  1. You write a small program (or configure /loop).
  2. Each tick, it prompts the coding agent.
  3. It reads what the agent produced (files, test output, PR state).
  4. It decides whether the task is done.
  5. If not, it prompts again—with fresh or anchored context.

You stop being the thing inside the loop typing prompts. You become the author of the loop. The model becomes a subroutine.

Boris describes three stages on his ladder:

StageWhat Boris didYour role
1. AutocompleteWrote code by hand with AI suggestionsTypist
2. Parallel sessionsRan 5–10 Claude sessions, prompted eachPrompt operator
3. LoopsWrites loops; agents read GitHub, Slack, Twitter and decide what to buildLoop engineer

The receipt: in the 30 days before December 27, 2025, Boris reported that 100% of his contributions to Claude Code were written by Claude Code259 PRs landed. He deleted his IDE in November 2025 and has not opened it since.

The nuance the "prompt engineering is dead" crowd skips: Boris is not saying engineers are obsolete. Someone still decides what to build, talks to customers, and coordinates teams. The job did not vanish—it moved from writing code to writing the thing that writes the code.

For the broader Anthropic framing, see our harness engineering deep dive. This post focuses on loop engineering as a buildable discipline.

Live Bootcamp6 weeks

Complete AI Builder Bootcamp

Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.

View bootcamp

The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.

The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.

Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.


Why "loop" started five different arguments

The June 2026 replies were a mess because "loop" hides at least five different things. Here is the ladder—oldest to newest—so you can stop talking past people.

Stage 1: The academic while-loop (ReAct, 2022)

The ReAct paper (2022) formalized the pattern: the model reasons, calls a tool, reads the result, repeats until done. One model, one loop, a human watching.

Stage 2: Goal-driven self-prompting (AutoGPT, 2023)

AutoGPT gave an agent a goal and let it prompt itself. It became famous for spinning forever doing nothing—which seeded years of "agents are a toy" skepticism.

Stage 3: The ralph loop (July 2025)

Geoffrey Huntley published the ralph loop in July 2025. In Huntley's words: "Ralph is a bash loop."

while :; do cat PROMPT.md | claude ; done

The innovation was not clever orchestration—it was discipline:

  • Every iteration resets context to a fixed set of anchor files (PROMPT.md, specs, AGENTS.md).
  • Progress lives on disk and in git, not in a growing conversation.
  • The agent does one discrete unit of work per iteration, validates, exits.

Huntley used ralph to build Cursed, an esoteric programming language, for roughly $297 in API costs—a number widely cited in practitioner threads. Matt Pocock's workshop coverage walks through a production-oriented ralph variant with test gates and commit-on-green logic.

@trashpandaemoji had the sharpest reply under Steinberger's tweet:

"It's not ralph/goal loops, that's old hat by now. It's probably some kind of continuous orchestration loop that oversees other threads/agents."

That reply is the closest correct answer in the public thread. Hold onto it.

Stage 4: Productized ralph (/goal, spring 2026)

In spring 2026, both Codex and Claude Code shipped a /goal command that runs until a validator confirms the task is done. See our Claude Code /goal guide and Goal Mode complete guide.

Stage 5: Orchestration loops (2026)

What Boris and Steinberger actually mean in June 2026 is genuinely new—not just renamed ralph. Four things changed:

ShiftRalph (2025)Orchestration loop (2026)
Unit of workOne task, one agentLoop supervises many tasks/agents
ConcurrencySequential bash pipeParallel sub-agents (worktrees, /batch)
SchedulingHuman starts terminalCron, /loop, infrastructure time
DurabilityTerminal must stay openGit-backed state, crash recovery

Steve Yegge's Gas Town (launched January 2026) coordinates 20–30 Claude Code instances via a "Mayor" agent, with patrol agents running continuous loops and state stored in git so work survives a crash. That is the continuous orchestration loop Trash Panda was reaching for—shipped and open source.


"It's just a cron job with a hat on"

The best skeptic line in the June 2026 corpus was four words:

"Cronjobs have funny re-branding rn."

Half right. Yes—the scheduling layer is cron. Boris literally runs loops on cron. Claude Code's /loop command uses scheduling under the hood.

If your whole definition is "a thing that runs on a timer," we invented that in 1975 and you can go home.

What cron never had is the part in the middle:

Cron jobAgent loop
Runs a fixed scriptRuns a model that reads current state
Same branches every tickDecides the next action each tick
No self-correctionCan verify, fail, and retry
One processCan dispatch and supervise other agents

@rohit_jsfreaky deflated the mythology cleanly:

"Every ai agent i shipped this year is a for-loop, an llm call, and a try/catch around the json parsing. The only thing agentic about it is the anthropic bill at the end of the month."

Honest framing: loops are cron plus a decision-maker in the body. The interesting engineering is everything you wrap around that decision so it does not run off a cliff.


What loop engineering looks like in practice

Enough lineage. Here is the on-ramp.

One line: Claude Code /loop

Claude Code ships /loop as a bundled skill. Boris's canonical starter:

/loop babysit all my PRs. Auto-fix build issues, and when comments come in, use a worktree agent to fix them.

Read that twice. He is not asking Claude to fix one PR. He is asking Claude to maintain all of them indefinitely, dispatching worktree-isolated sub-agents as comments arrive.

More examples from Boris's public posts:

CommandWhat it does
/loop 5m /babysitAuto-address code review, rebase PRs every 5 minutes
/loop 30m /slack-feedbackPut up PRs for Slack feedback on a 30-minute cadence
/loop 5m check the deployPoll deploy status on a fixed interval
/loop check the deploySame prompt, interval chosen dynamically by Claude

Dynamic intervals: when you omit the interval, Claude picks a delay between one minute and one hour based on what it observed—short waits while a build is finishing, longer waits when nothing is pending.

Custom default: a loop.md file in your project replaces the built-in maintenance prompt for bare /loop.

Stop a loop: press Esc while it is waiting for the next iteration.

Sources: Claude Code scheduled tasks docs, Vibe Coder breakdown of /loop.

Boris's five tips for hours-long autonomy

In June 2026, Boris posted five tips for running Opus autonomously for hours or days:

  1. Auto mode for permissions — Claude does not stop to ask for approval on every file write.
  2. Dynamic workflows — orchestrate hundreds or thousands of sub-agents for large tasks. See Claude Code dynamic workflows.
  3. /goal or /loop — nudge Claude to keep going until done.
  4. Claude Code in the cloud — close your laptop; the loop keeps running.
  5. Self-verify end to end — a loop is only as trustworthy as its ability to check its own work.

Tip 5 is what practitioners obsess over and hype skips.

The loop contract

Developers Digest names the pieces that turn an agent from a clever assistant into a useful background process:

TRIGGER  → every 15m, on PR comment, on CI failure
SCOPE    → open PRs authored by me, repo X only
ACTION   → run tests, fix lint, respond to review
BUDGET   → max 3 sub-agents per tick, 50k tokens
STOP     → all PRs green, or 10 iterations, or $5 spent
REPORT   → post summary to Slack #eng-bots

That is not "task, repeated." That is loop engineering.


Verification: the feedback inside the loop

The fastest-growing sub-theme after Steinberger's tweet was not orchestration—it was verification. @mosyaseen said it directly in the thread: half of loop engineering is design; the other half is something that can say no.

@DanKornas, shipping roborev, put it plainly:

"Your coding agent can move fast, but bad commits compound fast too."

An open loop that writes code with no feedback is a machine for generating confident mistakes. A loop that writes → runs → reads the result → corrects is what actually works in production.

Loop typeBehaviorProduction fit
Open loopAgent writes until it says "done"Demo only
Closed loopAgent runs tests/lint/review after each writeShip with guardrails
Review loopBackground reviewer feeds findings back while context is freshBest for long sessions

The loop is not the magic. The feedback inside it is.

Anchor files: VISION.md, CLAUDE.md, AGENTS.md

Steinberger's reply in the thread pointed at VISION.md—a project-level file that states what you are building and why, so each loop tick does not re-derive intent from scratch. That sits alongside:

FileRole in loops
VISION.mdNorth star: product direction, constraints, what "done" looks like
CLAUDE.md / AGENTS.mdOperating rules: stack, commands, guardrails per tick
PROMPT.md / loop.mdThe prompt the loop pipes in each iteration
Tests & type checksThe thing that says no when the agent is wrong

Pair loops with persistent project memory: What is CLAUDE.md? and MEMORY.md patterns. Steinberger's shared agent rules live in steipete/agent-scripts.


Guardrails: the expensive part is managing the loop

Once the model writes code for almost nothing, cost moves to the loop running it.

@runes_leo:

"The costliest thing in AI coding is no longer writing code, it's managing the agent loop."

The receipt: Uber capped engineers at $1,500 per person per tool per month for Claude Code and Cursor after burning its annual AI budget in four months, per June 2026 reporting in the discourse.

@cv_usk on the failure mode everyone in production fears:

"Without guardrails, you get infinite loops and billing surprises orders of magnitude over budget."

Every serious 2026 write-up converges on three hard stops:

1. Maximum iteration count

MAX_ITER=20
iter=0
while [ $iter -lt $MAX_ITER ]; do
  claude -p < prompt.md
  iter=$((iter + 1))
done

Claude Code's /goal tracks turns natively. Bare ralph loops have no ceiling unless you add one.

2. No-progress detection

Stop when the same error message, empty diff, or failing test appears N times in a row. Huntley tunes ralph prompts "like a guitar" based on failure patterns—loop engineering includes prompt iteration, not just bash.

3. Token or dollar budget ceiling

Set a per-loop budget before you sleep. AI token cost governance covers enterprise-scale patterns.

Gartner puts agentic AI at the peak of inflated expectations, with only about 17% of organizations actually deploying agents, per citations in Van Horn's research. The gap between the timeline and the receipts is the real state of play.


Skills are the asset inside the loop

Steinberger's other recurring point pairs with the loops thesis—and it is the more durable half:

If you do something more than once, turn it into an automated skill. If you do something hard, turn it into a skill afterward so next time is free.

A loop with no reusable skills inside it is a while true around a stranger. A loop that calls a library of sharp, tested, named skills is a system that compounds.

Boris's public advice: experiment with turning workflows into skills + loops.

/loop 30m /code-review
/loop 15m /fix-ci
/loop 1h /dependency-audit

Each skill is a named recipe—prompt + tool policy + verification steps. The loop is plumbing that invokes those recipes on a schedule.

Browse reusable skills at /skills. For security before you automate: Agent skills threat model.


Build your first loop this week

Level 0: Babysit one PR (15 minutes)

/loop 10m Review PR #123. If CI is failing, fix it. If there are unresolved review comments, address them in a worktree and push. If everything is green and approved, stop.

Watch two ticks. Confirm it reads state before acting.

Level 1: Ralph with guardrails (1 hour)

Create PROMPT.md:

You are an autonomous coding agent.

1. Read specs/TODO.md for the next unchecked item.
2. Implement exactly that item.
3. Run `npm test`.
4. If tests pass, commit with a descriptive message and mark the item done.
5. If tests fail twice with the same error, write BLOCKED to specs/TODO.md and exit.
6. Exit after one item either way.

Wrap it:

#!/bin/bash
MAX=10
for i in $(seq 1 $MAX); do
  cat PROMPT.md | claude -p --dangerously-skip-permissions
  grep -q "BLOCKED" specs/TODO.md && break
  grep -q "\[ \]" specs/TODO.md || break
  sleep 10
done

Run in an isolated worktree or container—not on your main machine without sandboxing.

Level 2: Orchestration (ongoing)

Combine /loop + skills + cloud sessions + /goal for multi-hour work. Read agent harness engineering for the seven-plane framework: loop policy, tool surface, context, sandbox, multi-agent routing, observability, model routing.


Key patterns from the research

Steinberger's tweet was the spark; the /last30days corpus (compiled June 7–8, 2026) distilled five durable patterns:

  1. A loop is cron plus a decision-maker — the model picks the next action each tick, not a hardcoded branch.
  2. Lineage is real — ReAct (2022) → AutoGPT (2023) → ralph (2025) → /goal (spring 2026) → orchestration loops (now). Single-agent ralph is old hat; multi-agent supervision is the new layer.
  3. Feedback makes loops trustworthy — tests, type checks, review gates; a loop with nothing to push back is the agent agreeing with itself.
  4. Cost shifted to loop management — cap iterations, detect no-progress, set a dollar budget.
  5. Skills compound; prompts burn — loops that call sharp named skills get cheaper over time; loops that re-derive everything do not.

Top voices in the corpus: @steipete (trigger), @bcherny (definition), @MatthewBerman (explainer video), @mvanhorn (research synthesis), @mosyaseen (verification framing).


Related reading

ExplainX guides

Primary sources

Community synthesis


Summary

Loop engineering is not a hot take about prompt engineering dying. It started as two sentences from @steipete and became a buildable discipline:

  1. Stop being the thing in the loop — write the loop once.
  2. Anchor intentVISION.md, CLAUDE.md, or AGENTS.md so each tick knows where it's going.
  3. Give it something that says no — tests, type checks, review gates.
  4. Give it skills worth calling — named recipes, not one-off prompts.
  5. Cap it so it halts — iterations, no-progress detection, dollar budget.
  6. Let it run on cron while you decide what to build next.

Steinberger named the shift; Boris shipped the primitives. The on-ramp, as of June 2026, is a single slash command:

/loop babysit all my PRs. Auto-fix build issues, and when comments come in, use a worktree agent to fix them.

The only people who truly know what a loop feels like are the ones who have already built one. @InderosD asked "how do we do that though?" in the original thread—the sections above are the answer. The good news: the tooling to start is already in your terminal.


Published June 9, 2026. Steinberger tweet view count and thread citations from June 8–9, 2026—verify against upstream before citing in production decisions.

Related posts