← Blog
explainx / blog

Anthropic Engineer: Stop Prompting Claude, Build Loops That Prompt Themselves (Harness Engineering Explained)

Boris Cherny from Anthropic reveals how engineers ship 8x more code by building iterative loops instead of single prompts. Learn harness engineering, the approach behind Claude authoring 80%+ of production code at Anthropic.

8 min readYash Thakker
AI CodingHarness EngineeringClaude CodeDeveloper ProductivityAnthropic

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Anthropic Engineer: Stop Prompting Claude, Build Loops That Prompt Themselves (Harness Engineering Explained)

TL;DR: Single prompts are obsolete for serious software engineering. Anthropic's Boris Cherny explains that production AI coding requires harness engineering—building systems that run iterative loops where Claude observes, plans, acts, and reflects over hours or days. This approach helped Anthropic engineers ship 8x more code daily, with Claude authoring over 80% of merged production code by May 2026.


The Paradigm Shift: From Prompts to Loops

"You're not supposed to prompt Claude. You're supposed to build a system that prompts itself."

This statement from Boris Cherny, engineer at Anthropic, has sparked a fundamental rethinking of how developers should use AI coding assistants.

The problem with single prompts:

  • Limited scope: Can't handle multi-file, multi-day projects
  • No iteration: AI can't learn from mistakes or refine approach
  • Context loss: Each prompt starts fresh without accumulated knowledge
  • Human bottleneck: Developer must manually orchestrate every step

The solution: harness engineering—systems that autonomously prompt AI agents in iterative loops.


What is Harness Engineering?

Harness engineering is the practice of building frameworks that orchestrate AI agents through repeated observe-plan-act-reflect cycles.

The Core Loop

1. OBSERVE → Analyze current codebase, test results, error logs
2. PLAN    → Determine next action based on observations
3. ACT     → Execute code changes, run tests, make commits
4. REFLECT → Evaluate results, identify gaps, adjust strategy
5. REPEAT  → Loop until task complete or timeout

This isn't a single prompt like "Add user authentication." It's a system that:

  • Breaks down complex tasks into sub-tasks
  • Executes each step autonomously
  • Validates results before proceeding
  • Adapts strategy based on outcomes
  • Runs for hours or days without human intervention

Example: Adding Authentication (Traditional vs Loop-Based)

Traditional Single Prompt:

User: "Add JWT authentication to the API"
Claude: [Generates auth code in one file]
User: [Realizes it needs database migration, middleware, tests, docs]
User: "Now add the migration"
Claude: [Generates migration]
User: "Add middleware"
... 15 more manual prompts ...

Loop-Based Harness Engineering:

# Simplified harness pseudocode
task = "Add JWT authentication to the API"
max_turns = 50
context = CodebaseContext()

for turn in range(max_turns):
    # OBSERVE
    status = context.analyze_codebase()
    test_results = context.run_tests()

    # PLAN
    plan = claude.plan_next_action(task, status, test_results)

    # ACT
    if plan.action == "modify_file":
        context.edit_file(plan.file_path, plan.changes)
    elif plan.action == "run_migration":
        context.execute_migration(plan.migration_file)
    elif plan.action == "write_tests":
        context.create_test_file(plan.test_code)

    # REFLECT
    if plan.task_complete:
        break

    # Update context for next iteration
    context.commit_changes(plan.commit_message)

Result: Claude autonomously:

  1. Adds JWT library dependencies
  2. Creates auth middleware
  3. Writes database migration for user tokens
  4. Updates API routes to use auth
  5. Writes integration tests
  6. Updates documentation
  7. Runs tests and fixes failing cases
  8. Commits with proper messages

All without human intervention beyond initial task specification.


The Anthropic Results: 8x Productivity, 80% AI-Authored Code

By May 2026, Anthropic engineers using harness engineering:

  • 8x daily code output compared to traditional development
  • 80%+ of merged production code authored by Claude
  • Hours to days of autonomous execution per task
  • 76% success rate on open-ended software tasks

What Changed?

Before (Single Prompts - Q1 2025):

  • Engineer writes detailed spec
  • Claude generates code
  • Engineer manually integrates, tests, debugs
  • Repeat 10-20 times per feature
  • Result: 60% AI-generated code, 40% human

After (Harness Engineering - Q2 2026):

  • Engineer specifies high-level goal
  • Harness loop runs autonomously
  • Claude observes, plans, acts, reflects
  • Human reviews and approves final PR
  • Result: 80% AI-generated code, 20% human (architecture + review)

How to Build Your Own Harness (Practical Guide)

Level 1: Simple Loop (1 hour implementation)

Start with a basic observe-act loop for repetitive tasks:

// Example: Auto-fix linting errors
async function lintFixLoop(maxIterations = 5) {
  for (let i = 0; i < maxIterations; i++) {
    // OBSERVE
    const lintResults = await runLinter();
    if (lintResults.errors.length === 0) break;

    // ACT
    const fixes = await claude.generateFixes(lintResults);
    await applyFixes(fixes);

    // REFLECT
    console.log(`Iteration ${i+1}: Fixed ${fixes.length} issues`);
  }
}

Use cases: Linting fixes, test debugging, dependency updates

Level 2: Multi-Step Task Decomposition (1 day implementation)

Add planning and task breakdown:

async function featureLoop(featureSpec: string) {
  // PLAN
  const tasks = await claude.breakdownFeature(featureSpec);

  for (const task of tasks) {
    // OBSERVE
    const context = await analyzeCodebase();

    // PLAN SUB-ACTIONS
    const actions = await claude.planImplementation(task, context);

    // ACT
    for (const action of actions) {
      await executeAction(action);
      await runTests();
    }

    // REFLECT
    const taskComplete = await claude.validateCompletion(task);
    if (!taskComplete) {
      tasks.push(await claude.identifyGaps(task));
    }
  }
}

Use cases: Feature implementation, refactoring, migration tasks

Level 3: Autonomous Multi-Day Projects (1 week implementation)

Full harness with error recovery, checkpoints, and human approval gates:

interface HarnessConfig {
  task: string;
  maxTurns: number;
  checkpointInterval: number;
  humanApprovalRequired: string[]; // e.g., ["database_migration", "api_breaking_change"]
}

async function autonomousHarness(config: HarnessConfig) {
  let turn = 0;
  let context = new ProjectContext();

  while (turn < config.maxTurns) {
    // OBSERVE
    const status = await context.fullAnalysis();

    // PLAN
    const plan = await claude.strategicPlan(config.task, status, turn);

    // HUMAN CHECKPOINT
    if (config.humanApprovalRequired.includes(plan.actionType)) {
      const approved = await requestHumanApproval(plan);
      if (!approved) continue;
    }

    // ACT
    try {
      await executeActionSafely(plan.action);
    } catch (error) {
      // ERROR RECOVERY LOOP
      const recovery = await claude.recoverFromError(error, context);
      await executeActionSafely(recovery.action);
    }

    // VALIDATE
    const testResults = await context.runFullTestSuite();

    // REFLECT
    const reflection = await claude.evaluateProgress(
      config.task,
      status,
      testResults,
      turn
    );

    if (reflection.taskComplete) break;
    if (reflection.stuck) {
      await requestHumanIntervention(reflection.issue);
    }

    // CHECKPOINT
    if (turn % config.checkpointInterval === 0) {
      await context.createCheckpoint();
    }

    turn++;
  }

  return context.generatePullRequest();
}

Use cases: Full feature development, complex refactors, multi-service changes


The 14% Claude.md Tax and How to Fix It

Boris Cherny highlighted a critical insight: 14% of developer productivity is lost to poorly structured CLAUDE.md files (or equivalent project context files).

The Problem

Bad CLAUDE.md:

# My Project
This is a web app for users.

## Stack
- React
- Node
- Postgres

## Instructions
Be helpful!

Result: Claude wastes turns asking basic questions about:

  • Project structure
  • Code style preferences
  • Testing approach
  • Deployment process
  • Business logic context

The Solution: Structured Context

Good CLAUDE.md for harness engineering:

# Project Context for AI Agents

## Architecture Map
- `/app/*` - Next.js App Router (React Server Components)
- `/lib/db/*` - Prisma ORM, PostgreSQL schemas
- `/lib/api/*` - tRPC API routes
- `/components/*` - React components (shadcn/ui + Tailwind)

## Code Style (CRITICAL - Follow Exactly)
- Server components by default; 'use client' only when needed
- Prefer server actions over API routes for mutations
- Database queries only in server components or server actions
- All async functions must handle errors with try-catch
- Use Zod for all input validation

## Testing Strategy
- Unit tests: Vitest for pure functions
- Integration tests: Playwright for user flows
- Run `pnpm test` before any commit
- Coverage requirement: 70%+

## Common Patterns
### Adding a new API endpoint
1. Define Zod schema in `/lib/schemas`
2. Create tRPC procedure in `/lib/api/routers`
3. Write integration test in `__tests__/api`
4. Update OpenAPI docs if public endpoint

### Database changes
1. Modify schema in `prisma/schema.prisma`
2. Run `pnpm db:migrate:dev` to create migration
3. Update seed data if needed
4. Test migration rollback works

## Deployment
- Production: Vercel (auto-deploy on main branch)
- Staging: Railway (auto-deploy on develop branch)
- Never commit secrets - use `.env.local` and Vercel env vars

## Business Context
- Users are B2B SaaS companies (SMB to mid-market)
- Average deal size: $50K-200K/year
- Security/compliance critical: SOC2, GDPR
- Performance target: p95 page load < 2s

Impact: Reduces wasted turns by 60%, allows Claude to make informed decisions without asking.


Real-World Success Stories

1. Developer Reports 76% Success Rate

Early adopters of harness engineering on Twitter report:

  • 76% task completion on open-ended software projects
  • 3-5x faster than manual development for complex features
  • Reduced context-switching: Set task, review final PR hours later

2. Tutorials Going Viral

The community has created extensive guides:

  • 24-minute workshop on harness engineering fundamentals
  • Step-by-step loop design tutorials
  • Open-source harness frameworks (LangGraph, AutoGPT-based)

3. Anthropic's Internal Adoption

By May 2026:

  • Every Anthropic engineer uses harness-based workflows
  • 80%+ production code written by Claude
  • Human role shifted: Architecture, review, strategy—not implementation

The Criticism: Bugs, Waste, and Expertise Gaps

Not everyone is convinced. Critics raise valid concerns:

1. Loop Bugs Can Waste Hours

Poorly designed loops can:

  • Infinite loops: Claude keeps "fixing" the same issue differently
  • Premature termination: Stops before task actually complete
  • Wasted compute: Runs expensive API calls on low-value iterations

Mitigation:

  • Set max turns (20-50 for most tasks)
  • Add explicit termination conditions
  • Monitor token usage and costs
  • Implement circuit breakers for repeated failures

2. Requires Prompt Engineering Expertise

Designing effective harnesses isn't beginner-friendly:

  • Writing observation prompts that extract relevant context
  • Structuring action spaces (which operations are allowed?)
  • Calibrating reflection prompts to avoid hallucinated "success"
  • Handling edge cases and error recovery

Reality: This is a new skill set. Teams need training and iteration.

3. Not All Tasks Suit Autonomous Loops

Bad fit for harness engineering:

  • High-risk changes: Database migrations on production
  • Creative/strategic work: Product vision, UX design philosophy
  • Highly ambiguous tasks: "Make the app better"

Good fit:

  • Well-defined scope: "Add email verification to signup flow"
  • Testable outcomes: "All tests pass + no type errors"
  • Repetitive patterns: "Update all API routes to use new auth middleware"

Tools That Support Harness Engineering

Native Support

  1. Claude Code CLI (Anthropic)

    • Built-in loop orchestration with /goal command
    • Persistent context across turns
    • Tool use (file ops, bash, testing)
    • Reflection and planning prompts
  2. Cursor AI (Anysphere)

    • Agent mode with multi-turn execution
    • Composer for complex refactors
    • Integrated testing and linting loops
  3. Aider (Open Source)

    • Git-integrated AI pair programmer
    • Automatic commit loops
    • Context-aware file selection

Frameworks for Custom Harnesses

  1. LangGraph (LangChain)

    • State machine for agent workflows
    • Built-in checkpointing and recovery
    • Conditional loops and branching
  2. AutoGPT

    • Autonomous task execution
    • Memory and learning across runs
    • Plugin ecosystem for tool use
  3. CrewAI

    • Multi-agent orchestration
    • Role-based agent specialization
    • Shared context management

How to Get Started (This Week)

Day 1: Learn Loop Fundamentals

  • Watch the 24-minute Anthropic workshop (search "Anthropic harness engineering")
  • Read Boris Cherny's thread on loops vs prompts
  • Analyze successful loop examples on GitHub

Day 2: Implement Simple Loop

  • Choose a repetitive task (linting, test fixing, docs generation)
  • Write a 10-turn observe-act loop using Claude API
  • Test on small codebase

Day 3: Add Planning Layer

  • Implement task decomposition (break feature into sub-tasks)
  • Add reflection step (validate each sub-task completion)
  • Test on medium-complexity feature

Day 4: Production Hardening

  • Add error recovery loops
  • Implement human approval gates for high-risk actions
  • Set up monitoring and cost tracking

Day 5: Optimize Context (Fix the 14% Tax)

  • Write comprehensive CLAUDE.md with architecture, patterns, business context
  • Test that loops waste fewer turns asking basic questions
  • Measure turn reduction

Weekend: Scale to Real Projects

  • Run harness on actual feature from backlog
  • Review final output, measure time savings
  • Iterate based on gaps and failures

The Future: Loops Everywhere

The shift from single prompts to iterative loops isn't limited to coding.

Emerging patterns across domains:

  • Marketing: Content loop generates blog, gets SEO analysis, refines, publishes
  • Data science: Model training loop evaluates performance, adjusts hyperparameters, reruns
  • Customer support: Ticket resolution loop analyzes issue, drafts response, validates with knowledge base
  • Design: UI generation loop creates mockups, validates accessibility, iterates on feedback

By 2027, every serious AI application will be loop-based, not prompt-based.

Single prompts will remain for:

  • Quick one-off questions
  • Creative brainstorming
  • Simple content generation

But complex work—software, analysis, research, strategy—will all use harness engineering.


Conclusion: Stop Prompting, Start Building

The developers shipping 8x more code aren't writing better prompts. They're building better systems.

The harness engineering mindset:

  • Don't ask AI to solve problems. Build systems that solve problems using AI.
  • Don't write prompts. Write loops that write prompts.
  • Don't generate code. Orchestrate agents that generate, test, refine, and ship code.

If you're still manually prompting Claude for every change, you're using a sports car as a bicycle.

The playbook:

  1. Identify a complex, multi-step task in your workflow
  2. Design an observe-plan-act-reflect loop to automate it
  3. Implement with checkpoints, validation, and human gates
  4. Iterate based on failures and edge cases
  5. Scale to more tasks as you learn loop design

By end of 2026, harness engineering will be a core skill for every software engineer—as fundamental as Git, testing, or code review.

The question isn't whether you'll adopt it. It's whether you'll be an early mover or late adapter.


Related Resources


Last updated: June 8, 2026 | Research sources: Boris Cherny (Anthropic), developer community reports, harness engineering tutorials, production deployment data

Related posts