What is harness engineering?

Harness engineering is the practice of building systems that prompt AI agents iteratively instead of relying on single prompts. Engineers design loops where Claude observes the codebase, plans changes, executes them, and reflects on results—repeating this cycle over hours or days to complete complex software tasks autonomously.

How much code do Anthropic engineers ship using Claude?

By May 2026, Claude authored over 80% of merged production code at Anthropic. Engineers using harness engineering ship 8x more code daily compared to traditional development approaches, with Claude handling implementation while humans focus on architecture and review.

What's the difference between prompting and loop-based AI coding?

Single prompts ask AI to generate code once and stop. Loop-based approaches create iterative cycles where AI observes context, plans next steps, executes changes, validates results, and repeats. This enables handling complex multi-file projects that single prompts can't complete.

What tools support harness engineering workflows?

Claude Code CLI, Cursor AI, and custom frameworks built on LangGraph or AutoGPT support iterative loops. Key features include persistent context, tool use (file operations, testing, git), reflection mechanisms, and multi-turn orchestration that runs for extended periods.

What are the success rates for loop-based AI coding?

Early adopters report 76% success rates on open-ended software tasks using iterative loops, compared to 20-30% with single-prompt approaches. However, success requires skilled prompt design, error handling, and knowing when to stop loops to avoid wasted compute.

What are the downsides of harness engineering?

Bugs in loop logic can waste hours of compute on unproductive cycles. Poor termination conditions lead to infinite loops or premature exits. Teams need expertise in designing observation-action cycles, validation steps, and human checkpoints for high-risk changes.

Anthropic Engineer: Build Loops That Prompt AI, Not Single Prompts | explainx.ai Blog

explainx.ainewsletter3.5k

Anthropic Engineer: Build Loops That Prompt AI, Not Single Prompts | explainx.ai Blog | explainx.ai

snippet

1. OBSERVE → Analyze current codebase, test results, error logs
2. PLAN    → Determine next action based on observations
3. ACT     → Execute code changes, run tests, make commits
4. REFLECT → Evaluate results, identify gaps, adjust strategy
5. REPEAT  → Loop until task complete or timeout

snippet

User: "Add JWT authentication to the API"
Claude: [Generates auth code in one file]
User: [Realizes it needs database migration, middleware, tests, docs]
User: "Now add the migration"
Claude: [Generates migration]
User: "Add middleware"
... 15 more manual prompts ...

python

# Simplified harness pseudocode
task = "Add JWT authentication to the API"
max_turns = 50
context = CodebaseContext()

for turn in range(max_turns):
    # OBSERVE
    status = context.analyze_codebase()
    test_results = context.run_tests()

    # PLAN
    plan = claude.plan_next_action(task, status, test_results)

    # ACT
    if plan.action == "modify_file":
        context.edit_file(plan.file_path, plan.changes)
    elif plan.action == "run_migration":
        context.execute_migration(plan.migration_file)
    elif plan.action == "write_tests":
        context.create_test_file(plan.test_code)

    # REFLECT
    if plan.task_complete:
        break

    # Update context for next iteration
    context.commit_changes(plan.commit_message)

typescript

// Example: Auto-fix linting errors
async function lintFixLoop(maxIterations = 5) {
  for (let i = 0; i < maxIterations; i++) {
    // OBSERVE
    const lintResults = await runLinter();
    if (lintResults.errors.length === 0) break;

    // ACT
    const fixes = await claude.generateFixes(lintResults);
    await applyFixes(fixes);

    // REFLECT
    console.log(`Iteration ${i+1}: Fixed ${fixes.length} issues`);
  }
}

typescript

async function featureLoop(featureSpec: string) {
  // PLAN
  const tasks = await claude.breakdownFeature(featureSpec);

  for (const task of tasks) {
    // OBSERVE
    const context = await analyzeCodebase();

    // PLAN SUB-ACTIONS
    const actions = await claude.planImplementation(task, context);

    // ACT
    for (const action of actions) {
      await executeAction(action);
      await runTests();
    }

    // REFLECT
    const taskComplete = await claude.validateCompletion(task);
    if (!taskComplete) {
      tasks.push(await claude.identifyGaps(task));
    }
  }
}

typescript

interface HarnessConfig {
  task: string;
  maxTurns: number;
  checkpointInterval: number;
  humanApprovalRequired: string[]; // e.g., ["database_migration", "api_breaking_change"]
}

async function autonomousHarness(config: HarnessConfig) {
  let turn = 0;
  let context = new ProjectContext();

  while (turn < config.maxTurns) {
    // OBSERVE
    const status = await context.fullAnalysis();

    // PLAN
    const plan = await claude.strategicPlan(config.task, status, turn);

    // HUMAN CHECKPOINT
    if (config.humanApprovalRequired.includes(plan.actionType)) {
      const approved = await requestHumanApproval(plan);
      if (!approved) continue;
    }

    // ACT
    try {
      await executeActionSafely(plan.action);
    } catch (error) {
      // ERROR RECOVERY LOOP
      const recovery = await claude.recoverFromError(error, context);
      await executeActionSafely(recovery.action);
    }

    // VALIDATE
    const testResults = await context.runFullTestSuite();

    // REFLECT
    const reflection = await claude.evaluateProgress(
      config.task,
      status,
      testResults,
      turn
    );

    if (reflection.taskComplete) break;
    if (reflection.stuck) {
      await requestHumanIntervention(reflection.issue);
    }

    // CHECKPOINT
    if (turn % config.checkpointInterval === 0) {
      await context.createCheckpoint();
    }

    turn++;
  }

  return context.generatePullRequest();
}

markdown

# Project Context for AI Agents

## Architecture Map
- `/app/*` - Next.js App Router (React Server Components)
- `/lib/db/*` - Prisma ORM, PostgreSQL schemas
- `/lib/api/*` - tRPC API routes
- `/components/*` - React components (shadcn/ui + Tailwind)

## Code Style (CRITICAL - Follow Exactly)
- Server components by default; 'use client' only when needed
- Prefer server actions over API routes for mutations
- Database queries only in server components or server actions
- All async functions must handle errors with try-catch
- Use Zod for all input validation

## Testing Strategy
- Unit tests: Vitest for pure functions
- Integration tests: Playwright for user flows
- Run `pnpm test` before any commit
- Coverage requirement: 70%+

## Common Patterns
### Adding a new API endpoint
1. Define Zod schema in `/lib/schemas`
2. Create tRPC procedure in `/lib/api/routers`
3. Write integration test in `__tests__/api`
4. Update OpenAPI docs if public endpoint

### Database changes
1. Modify schema in `prisma/schema.prisma`
2. Run `pnpm db:migrate:dev` to create migration
3. Update seed data if needed
4. Test migration rollback works

## Deployment
- Production: Vercel (auto-deploy on main branch)
- Staging: Railway (auto-deploy on develop branch)
- Never commit secrets - use `.env.local` and Vercel env vars

## Business Context
- Users are B2B SaaS companies (SMB to mid-market)
- Average deal size: $50K-200K/year
- Security/compliance critical: SOC2, GDPR
- Performance target: p95 page load < 2s

Anthropic Engineer: Stop Prompting Claude, Build Loops That Prompt Themselves (Harness Engineering Explained)

The Paradigm Shift: From Prompts to Loops

Related posts

AI Speeds Demos but Final Polish Takes Months — Kr$na's Dev Cycle Chart (July 2026)

Boris Cherny's Steps of AI Adoption: Claude Code's 0–4 Maturity Model (July 2026)

Fable 5 in Claude Code After Relaunch: Classifier Fallbacks, Rate Limits, and What Developers Are Saying

What is Harness Engineering?

The Core Loop

Example: Adding Authentication (Traditional vs Loop-Based)

The Anthropic Results: 8x Productivity, 80% AI-Authored Code

What Changed?

How to Build Your Own Harness (Practical Guide)

Level 1: Simple Loop (1 hour implementation)

Level 2: Multi-Step Task Decomposition (1 day implementation)

Level 3: Autonomous Multi-Day Projects (1 week implementation)

The 14% Claude.md Tax and How to Fix It

The Problem

The Solution: Structured Context

Real-World Success Stories

1. Developer Reports 76% Success Rate

2. Tutorials Going Viral

3. Anthropic's Internal Adoption

The Criticism: Bugs, Waste, and Expertise Gaps

1. Loop Bugs Can Waste Hours

2. Requires Prompt Engineering Expertise

3. Not All Tasks Suit Autonomous Loops

Tools That Support Harness Engineering

Native Support

Frameworks for Custom Harnesses

How to Get Started (This Week)

Day 1: Learn Loop Fundamentals

Day 2: Implement Simple Loop

Day 3: Add Planning Layer

Day 4: Production Hardening

Day 5: Optimize Context (Fix the 14% Tax)

Weekend: Scale to Real Projects

The Future: Loops Everywhere

Conclusion: Stop Prompting, Start Building

Anthropic Engineer: Stop Prompting Claude, Build Loops That Prompt Themselves (Harness Engineering Explained)

The Paradigm Shift: From Prompts to Loops

Related posts

AI Speeds Demos but Final Polish Takes Months — Kr$na's Dev Cycle Chart (July 2026)

Boris Cherny's Steps of AI Adoption: Claude Code's 0–4 Maturity Model (July 2026)

Fable 5 in Claude Code After Relaunch: Classifier Fallbacks, Rate Limits, and What Developers Are Saying

What is Harness Engineering?

The Core Loop

Example: Adding Authentication (Traditional vs Loop-Based)

The Anthropic Results: 8x Productivity, 80% AI-Authored Code

What Changed?

How to Build Your Own Harness (Practical Guide)

Level 1: Simple Loop (1 hour implementation)

Level 2: Multi-Step Task Decomposition (1 day implementation)

Level 3: Autonomous Multi-Day Projects (1 week implementation)

The 14% Claude.md Tax and How to Fix It

The Problem

The Solution: Structured Context

Real-World Success Stories

1. Developer Reports 76% Success Rate

2. Tutorials Going Viral

3. Anthropic's Internal Adoption

The Criticism: Bugs, Waste, and Expertise Gaps

1. Loop Bugs Can Waste Hours

2. Requires Prompt Engineering Expertise

3. Not All Tasks Suit Autonomous Loops

Tools That Support Harness Engineering

Native Support

Frameworks for Custom Harnesses

How to Get Started (This Week)

Day 1: Learn Loop Fundamentals

Day 2: Implement Simple Loop

Day 3: Add Planning Layer

Day 4: Production Hardening

Day 5: Optimize Context (Fix the 14% Tax)

Weekend: Scale to Real Projects

The Future: Loops Everywhere

Conclusion: Stop Prompting, Start Building

Related Resources