← Blog
explainx / blog

Matt Pocock's AI Coding Workshop for Real Engineers: Smart Zones, Tracer Bullets, and Autonomous Agent Workflows

Matt Pocock's full AI coding workshop covers practical engineering workflows: smart zone vs dumb zone, grill-me alignment, PRD-to-vertical-slices, TDD with AI, AFK agents, Sand Castle parallelization, and production code review.

26 min readYash Thakker
ai-codingclaude-codeengineeringmatt-pococktddsoftware-architecture

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Matt Pocock's AI Coding Workshop for Real Engineers: Smart Zones, Tracer Bullets, and Autonomous Agent Workflows

separate branches using tools like Sand Castle while developers focus on high-value planning and architecture.

  • question: What is Sand Castle and how does it enable parallelization? answer: >- Sand Castle is Matt Pocock's TypeScript framework for orchestrating multiple sandboxed coding agents in parallel. It creates git worktrees, runs agents in Docker containers on separate branches, and coordinates planner, implementation, reviewer, and merger agents. This enables production teams to parallelize AFK work safely with proper isolation and review pipelines. seoTitle: "Matt Pocock's AI Coding Workshop: Smart Zones, TDD & Autonomous Agents" seoDescription: >- Matt Pocock's full AI coding workshop covers practical engineering workflows: smart zone vs dumb zone, grill-me alignment, PRD-to-vertical-slices, TDD with AI, AFK agents, Sand Castle parallelization, and production code review.

On April 24, 2026, TypeScript educator and AI coding pioneer Matt Pocock released a comprehensive 2-hour workshop titled "AI Coding for Real Engineers" that fundamentally challenges the notion that AI makes software engineering fundamentals obsolete.

While many treat AI coding as a paradigm shift that renders 20 years of software engineering wisdom irrelevant, Pocock makes the opposite argument: engineering fundamentals matter more than ever when working with AI agents. The workshop demonstrates that code quality, architecture decisions, TDD discipline, and design patterns become critical leverage points that determine whether AI agents amplify your capabilities or multiply technical debt.

This article provides a comprehensive breakdown of the workshop's core concepts, practical workflows, and production-ready patterns for integrating AI agents into professional software development.


TL;DR

TopicKey insight
Workshop focusComplete AI-assisted development lifecycle from ambiguous requirements to autonomous agent deployment
Core thesisSoftware engineering fundamentals are not obsolete—they're the leverage layer that makes AI agents useful
Smart vs dumb zoneLLMs degrade beyond ~100k tokens (40% context); keep work in the smart zone through task decomposition
Grill-me skillForce agents to ask 40-80 alignment questions before coding to reach shared understanding
PRD workflowSynthesize conversations into structured PRDs, then decompose into vertical-slice GitHub issues
Tracer bulletsBuild thin vertical slices through all layers (schema/API/UI/tests), not horizontal layers
TDD with AIWrite failing tests first, let agents implement to green, review and refactor
AFK agentsClassify issues as HITL (human decision) or AFK (autonomous implementation)
Sand CastleTypeScript framework for parallelizing sandboxed agents with worktrees and Docker isolation
Code reviewUse Sonnet for speed, Opus for review quality, always in fresh context
Deep modulesFavor high-leverage interfaces that hide complexity over shallow modules that leak it

The fundamental problem: everyone thinks AI changes everything

The dominant narrative in 2026 is that AI coding assistants represent a paradigm shift so profound that traditional software engineering practices—design patterns, TDD, architecture reviews, code review processes—are suddenly obsolete boomer baggage.

Matt Pocock's workshop opens by rejecting this framing entirely.

His counter-thesis: AI agents are powerful, but they're only as good as the engineering constraints, workflows, and quality standards you give them. If you feed an agent vague requirements, skip tests, ignore architecture, and treat code as disposable, the agent will amplify those bad habits at machine speed.

The workshop demonstrates the opposite approach: treating AI as a force multiplier for disciplined engineering. When you combine strong fundamentals with agent capabilities, you get compound leverage. When you abandon fundamentals, you get compound dysfunction.

This is why Pocock built Claude Code for Real Engineers, a 2-week cohort teaching AI coding from first principles, and open-sourced his skills repository containing 29+ agent skills that encode professional engineering workflows.


Smart zone vs dumb zone: the context window cliff

One of the workshop's most actionable insights is the concept of the smart zone and dumb zone in LLM context windows.

The threshold

At approximately 100,000 tokens—or roughly 40% of the total context window—LLMs begin entering what Pocock calls the "dumb zone" where reasoning quality degrades sharply. The exact boundary varies by model and task complexity, but everyone working with long-context models has observed this cliff.

The cause is quadratic attention scaling: as context grows, the computational cost of attending to all tokens increases exponentially, leading to quality degradation even when the model technically has capacity for more tokens.

Why this matters for AI coding

Most developers instinctively believe: more context = better results. Just paste the entire codebase, all previous conversations, every GitHub issue, and let the model figure it out.

Pocock's data shows the opposite: bloated context actively harms output quality. Once you cross into the dumb zone, the agent starts missing critical details, hallucinating solutions that contradict earlier context, and producing generic code that ignores project-specific constraints.

The practical strategy

The workshop emphasizes a multi-phase decomposition approach:

PhaseGoalContext size
AlignmentUnderstand requirements deeply through grill-me questioningMinimal: just the initial request
PlanningProduce PRD and break into vertical slicesMedium: PRD + architecture docs
ImplementationEach agent works on one isolated issueSmall: issue description + relevant code only
ReviewFresh context evaluation of changesClean: just the diff + quality standards

By keeping each phase within the smart zone, you maintain reasoning quality throughout the development lifecycle.

This is not theoretical optimization. Pocock demonstrates real examples where the same task produces dramatically different results depending on whether context is kept lean or allowed to bloat.


The grill-me skill: forcing alignment before code

The second major pattern from the workshop is what Pocock calls the "grill-me" skill—an alignment phase that happens before any code is written.

The problem it solves

The default workflow with AI agents is:

  1. Describe what you want
  2. Agent starts coding immediately
  3. Discover 20 minutes later the agent misunderstood half your requirements
  4. Throw away the work and start over

This is expensive, demoralizing, and creates an adversarial relationship with the tool.

How grill-me works

The grill-me skill flips this dynamic. Instead of letting the agent immediately implement, you force it to interrogate you first:

Before writing any code, ask me at least 40 questions to ensure you
fully understand the requirements, edge cases, architecture constraints,
performance requirements, testing strategy, and integration points.

Do not stop asking questions until you can articulate the complete
solution back to me and I confirm we have shared understanding.

What happens in practice

Pocock's examples show agents asking 40 to 80 targeted questions covering:

  • Requirements clarity: "You mentioned user roles. Which roles? What can each role do?"
  • Edge cases: "What happens if two users update the same record simultaneously?"
  • Architecture constraints: "Should this be a new service or extend the existing API?"
  • Performance requirements: "What's the expected request volume? What latency is acceptable?"
  • Testing strategy: "Do you want integration tests, unit tests, or both?"
  • Integration points: "Which external services does this touch? Are there rate limits?"
  • Data model: "Does the schema already exist or are we creating it?"
  • Error handling: "What should happen on failure? Retry? Alert? Rollback?"

This process produces what Pocock calls a shared design concept: you and the agent are genuinely on the same wavelength about what's being built.

Why this saves time

Investing 10-15 minutes in alignment questions prevents hours of wasted autonomous work. More importantly, it surfaces misunderstandings, missing requirements, and architectural conflicts before they become code problems.

Pocock documents this pattern in his grill-me skill along with common mistakes teams make when using it.


PRD creation: turning conversation into structure

Once alignment is complete, the workshop demonstrates converting that shared understanding into a Product Requirements Document (PRD) that becomes the source of truth.

The to-prd skill

Pocock's to-prd skill synthesizes conversations and grill-me sessions into structured PRDs that typically include:

PRD sectionPurpose
Problem statementWhat user/business need are we solving?
Proposed solutionHigh-level approach and architecture
Success criteriaMeasurable outcomes that indicate done
User storiesSpecific scenarios and use cases
Technical constraintsPerformance, security, compatibility requirements
Out of scopeExplicitly document what we're NOT building
DependenciesOther systems, teams, or features this requires
Testing strategyHow will we validate this works?
Rollout planFeature flags, gradual rollout, rollback strategy

Why PRDs matter for AI agents

A PRD serves as the destination document that agents can reference throughout implementation. When an agent gets confused or needs to make a design decision, it can return to the PRD for authoritative guidance.

This prevents scope creep, ensures consistency across multiple agents working in parallel, and provides a clear definition of "done" that isn't subject to interpretation.

Filing as GitHub issues

Pocock's workflow has the PRD automatically filed as a GitHub issue. This creates:

  • Single source of truth for the entire feature
  • Discussion thread where team members can comment, question, or suggest changes
  • Audit trail showing how requirements evolved
  • Link target that implementation PRs can reference

This is not bureaucracy—it's infrastructure that makes parallel autonomous work possible.


Vertical slices: tracer bullets through the stack

With a PRD in place, the next phase is decomposition. This is where Pocock's approach diverges sharply from typical project planning.

Horizontal layering (what most teams do)

The default instinct is to break work into horizontal layers:

  1. Design database schema for all entities
  2. Build all API endpoints
  3. Create all UI components
  4. Write tests at the end

This creates long-running branches, late integration, and makes it impossible to validate whether the feature works until everything is done.

Vertical slicing (Pocock's approach)

Instead, Pocock advocates tracer bullets—thin vertical slices that cut through every layer to deliver a complete, working feature:

Issue 1: User can create a basic todo
- Schema: Add `todos` table with id, text, completed, user_id
- API: POST /todos endpoint with validation
- UI: Simple form to create todo
- Tests: End-to-end test covering create flow

Issue 2: User can mark todo as complete
- Schema: Use existing completed field
- API: PATCH /todos/:id endpoint
- UI: Checkbox component
- Tests: End-to-end test covering complete flow

Issue 3: User can delete todo
- Schema: No changes needed
- API: DELETE /todos/:id endpoint
- UI: Delete button with confirmation
- Tests: End-to-end test covering delete flow

Each issue is independently shippable. Each can be picked up by a different agent. Each can be tested and validated immediately.

The to-issues skill

Pocock's to-issues skill takes a PRD and automatically decomposes it into vertical-slice GitHub issues with:

  • Clear scope: Exactly what this slice includes
  • Acceptance criteria: What "done" looks like
  • Testing requirements: How to validate
  • Dependencies: Which other issues must complete first
  • HITL vs AFK classification: Does this need human judgment or can an agent handle it autonomously?

This decomposition is not busywork—it's the infrastructure that enables parallelization.


TDD with AI agents: write tests first, let agents go green

One of the workshop's most controversial claims is that Test-Driven Development (TDD) becomes more important with AI, not less.

Why developers skip TDD with AI

The common argument: "AI can write both code and tests together, why slow down with TDD?"

Pocock's response: because AI agents are optimizers. Give them a vague problem and they'll produce code that superficially works but doesn't actually solve your requirements. Give them a failing test and they'll produce code that provably solves the problem you specified.

The TDD workflow with agents

1. Red: Human writes a failing test describing desired behavior
2. Green: Agent writes minimum code to make test pass
3. Refactor: Agent or human cleans up implementation
4. Repeat: Move to next test

The key insight: tests are specifications that agents can verify against. When you write the test first, you're forcing yourself to think clearly about:

  • What inputs should produce what outputs?
  • What edge cases must be handled?
  • What error conditions should be caught?
  • What side effects are acceptable?

Example from the workshop

Instead of:

"Build a function that validates email addresses"

Write a failing test:

describe('validateEmail', () => {
  it('accepts valid emails', () => {
    expect(validateEmail('[email protected]')).toBe(true);
  });

  it('rejects emails without @', () => {
    expect(validateEmail('userexample.com')).toBe(false);
  });

  it('rejects emails without domain', () => {
    expect(validateEmail('user@')).toBe(false);
  });

  it('accepts emails with subdomains', () => {
    expect(validateEmail('[email protected]')).toBe(true);
  });
});

Then tell the agent: "Make these tests pass."

The agent now has an unambiguous success criterion. No guessing about requirements. No debating whether the implementation is "correct." The tests define correct.

Why this works with autonomous agents

When you send an agent off to work AFK (away from keyboard), you can't supervise every decision. But you can give it a test suite that defines acceptable behavior.

Pocock shows examples where:

  • Agent tries 3 different implementation approaches before finding one that passes all tests
  • Agent discovers edge cases you didn't specify and adds defensive checks
  • Agent refactors implementation while keeping tests green

This is specification-driven autonomous work. The tests are your quality enforcement layer when you're not watching.


Deep modules vs shallow modules: hiding complexity

The workshop spends significant time on software architecture principles, particularly the concept of deep modules from John Ousterhout's A Philosophy of Software Design.

The core distinction

Deep modulesShallow modules
Small, simple interfaceComplex, intricate interface
Hide significant complexityExpose implementation details
High leverage: lots of functionality behind minimal APILow leverage: interface nearly as complex as implementation
Example: fs.readFile(path) hides buffering, streaming, error handlingExample: new XMLHttpRequest() with 12 setup methods before use

Why this matters for AI agents

Pocock argues that AI agents write better code when working with deep modules, because:

  1. Less context required: Agent only needs to understand the interface, not the internals
  2. Fewer integration points: Less surface area for bugs and misunderstandings
  3. Better encapsulation: Changes don't cascade across the codebase
  4. Clearer contracts: Deep modules have obvious input/output semantics

Conversely, shallow modules force agents to hold more implementation details in context, increasing the chance of crossing into the dumb zone.

Practical application

When reviewing agent-generated code, Pocock recommends asking:

  • Does this module expose unnecessary complexity?
  • Could the interface be simpler while hiding the same functionality?
  • Are we leaking implementation details that will create coupling?
  • Is the cognitive load at the call site higher than it needs to be?

The workshop includes his improve-codebase-architecture skill that agents can use to identify shallow modules and propose deeper alternatives.


AFK agents: away from keyboard implementation

One of the workshop's most production-ready patterns is the AFK (away from keyboard) classification for autonomous agent work.

HITL vs AFK

When decomposing PRDs into issues, each issue gets classified:

ClassificationMeaningExample
HITLHuman in the loop: requires judgment, architecture decisions, or business tradeoffs"Should we cache this data or recompute on every request?"
AFKAway from keyboard: well-defined implementation with clear acceptance criteria"Add validation to email field per PRD specification"

What makes good AFK work

Issues are AFK-ready when they have:

  • Clear specification: Failing test or detailed acceptance criteria
  • No architectural decisions: The design is already determined
  • Isolated scope: Changes are contained to one module or feature
  • Obvious validation: Tests or demo steps that prove it works
  • Low risk: Failure doesn't compromise security, data integrity, or production stability

The AFK workflow

  1. Planner agent examines backlog and identifies non-blocked AFK issues
  2. Implementation agents pick issues in parallel, each in isolated environment
  3. Agents commit when tests pass and acceptance criteria are met
  4. Reviewer agents check output against quality standards
  5. Merger agent integrates approved changes

Pocock demonstrates a real example where 3 agents work in parallel on separate vertical slices, implementing and testing autonomously while he's in meetings.

Why this is not "fully autonomous coding"

Critically, AFK does not mean "let the agent do whatever it wants." It means:

  • Human-defined scope: You decided this issue is safe for autonomous work
  • Human-written tests: The agent must satisfy specifications you created
  • Human review gates: Output is reviewed before merge
  • Human architecture: The design decisions are already made

The autonomous part is the mechanical implementation of well-specified work. The judgment, design, and quality standards remain human responsibilities.


Sand Castle: parallelizing autonomous agents safely

To operationalize AFK workflows at production scale, Pocock built Sand Castle, a TypeScript framework for orchestrating sandboxed coding agents.

The core problem

Running multiple agents in parallel on the same codebase creates:

  • Git conflicts from agents editing the same files
  • Test interference when agents run tests simultaneously
  • Resource contention for databases, ports, and file locks
  • Environment pollution from package installations and config changes
  • Security risk if agents have unbounded system access

How Sand Castle solves this

import { run } from 'sandcastle';

await run({
  issue: 'Add user profile endpoint',
  branch: 'feature/user-profile',
  worktree: true,  // Isolate in git worktree
  docker: true,    // Sandboxed execution
  timeout: 30 * 60 * 1000,  // 30 minute max
});

For each agent, Sand Castle:

  1. Creates a git worktree: Isolated working directory on separate branch
  2. Spins up Docker container: Sandboxed environment with resource limits
  3. Passes issue context: Agent only sees relevant code and requirements
  4. Monitors execution: Logs, timeout enforcement, error capture
  5. Collects output: Committed changes, test results, agent logs
  6. Cleans up: Removes worktree and container when done

The four agent types

Sand Castle coordinates:

Agent typeResponsibility
PlannerExamines full backlog, identifies non-blocked AFK issues, determines parallelization opportunities
ImplementationWorks on single issue in isolated environment, writes code, runs tests, commits when green
ReviewerReviews implementation agent output against quality standards, fresh context
MergerIntegrates approved changes, resolves conflicts if needed, validates integration tests

Production considerations

The workshop emphasizes Sand Castle is not a "set it and forget it" solution. Production use requires:

  • Clear issue specifications: Garbage in, garbage out
  • Comprehensive test suites: Agents validate against your tests
  • Review gates: Human or Opus agent reviews before merge
  • Monitoring: Track success rates, common failures, quality trends
  • Cost awareness: Parallel agents consume API quota quickly
  • Rollback procedures: Be prepared to revert failed deployments

Pocock shows real examples where Sand Castle enables 3-5x faster feature delivery for well-specified work, while acknowledging it's inappropriate for exploratory or high-judgment tasks.


Code review strategies: Sonnet for speed, Opus for quality

The workshop dedicates substantial time to code review as a quality gate for agent-generated code.

The anti-pattern

The default approach many teams adopt:

  1. Agent writes code
  2. Glance at the diff
  3. Merge if it looks fine
  4. Discover problems in production

This treats code review as a formality rather than a quality enforcement mechanism.

Pocock's recommended strategy

Review aspectApproachTool
ImplementationUse Sonnet for speed and costClaude Sonnet 4.5
ReviewUse Opus for quality and judgmentClaude Opus 4.5
ContextReviewer always works in fresh context, not implementation threadNew conversation
StandardsExplicit quality checklist provided to reviewerDocumented criteria
FocusList issues first, not praiseCritical evaluation

Why use different models for implementation vs review

  • Sonnet: Fast, cost-effective, good at following specifications
  • Opus: Superior reasoning, catches subtle bugs, better architectural judgment

By using Sonnet for the mechanical implementation work and Opus for critical review, you optimize for both velocity and quality.

The fresh context principle

This is critical: the reviewer should not have access to the implementation conversation history.

Why? Because seeing the implementation context creates anchoring bias. The reviewer starts evaluating whether the agent solved the problem it thought it was solving, rather than whether the solution is objectively correct.

Fresh context forces the reviewer to evaluate:

  • Does this code do what the issue specification claims?
  • Are there edge cases the implementation missed?
  • Are there security, performance, or maintainability problems?
  • Does this align with project architecture and standards?

What good review output looks like

Pocock shows examples where reviewer agents:

  1. List issues first, not praise
  2. Provide specific examples of problems found
  3. Suggest fixes with code snippets
  4. Assess merge-readiness: Can this ship? What blocks it?
  5. Highlight architectural concerns that might affect future work

Poor review: "This looks good, I don't see any issues."

Good review: "This implementation has three problems: 1) Missing null check on line 47 could throw in production. 2) No rate limiting on the endpoint. 3) Error messages leak internal implementation details. Recommend fixing before merge."


The ralph loop: bash-driven autonomous agents

A brief but powerful pattern from the workshop is what the community calls the "ralph loop"—popularized by Pocock's demonstrations of bash-driven agent automation.

The basic pattern

while true; do
  claude-code --file prompt.md

  if [ $? -eq 0 ]; then
    git add .
    git commit -m "Agent work completed"
  else
    echo "Agent failed, stopping loop"
    break
  fi

  sleep 5
done

The prompt.md file tells the agent:

  1. Read your state from disk (issue descriptions, tests, last commit message)
  2. Pick the next AFK issue from backlog
  3. Implement it following project standards
  4. Run tests
  5. Commit if green
  6. Exit with success code

Why this works

The loop enables continuous autonomous work without human intervention. The agent:

  • Reads its own progress from git history
  • Picks work from a prioritized backlog
  • Self-validates through tests
  • Commits only when successful
  • Stops when it encounters ambiguity or failure

This is the operational pattern behind Pocock's demonstrations of "I started 3 agents before lunch and they shipped 8 features by end of day."

When to use the ralph loop

Good use cases:

  • Backlog of well-specified, test-defined issues
  • Stable test suite that reliably catches regressions
  • Low-risk changes (feature flags protect production)
  • Cost-insensitive context (burning API quota is acceptable)

Bad use cases:

  • Exploratory work requiring judgment
  • Codebase without comprehensive tests
  • Changes touching security, payments, or data integrity
  • Situations where mistakes are expensive to fix

The ralph loop is a production pattern for mechanical work, not a replacement for thoughtful engineering.


Practical workflow: end-to-end example

The workshop culminates in a complete walkthrough demonstrating all concepts together:

Phase 1: Alignment (10-15 minutes, human + AI)

  1. Developer describes high-level feature request
  2. Run /grill-me skill
  3. Agent asks 40-60 clarifying questions
  4. Developer answers, agent captures edge cases, constraints, dependencies
  5. Reach shared understanding

Phase 2: Planning (15-20 minutes, human + AI)

  1. Run /to-prd to generate structured PRD
  2. File PRD as GitHub issue
  3. Review and refine PRD with team
  4. Run /prd-to-issues to decompose into vertical slices
  5. Classify each issue as HITL or AFK
  6. Sequence issues based on dependencies

Phase 3: Implementation (autonomous, parallel agents)

For each AFK issue:

  1. Sand Castle creates worktree and Docker environment
  2. Implementation agent loads issue + relevant context (stays in smart zone)
  3. Agent writes failing test (if not already present)
  4. Agent implements code to pass tests
  5. Agent commits on success
  6. Reviewer agent evaluates in fresh context
  7. Merger agent integrates if approved

Developer focuses on HITL issues requiring architectural decisions.

Phase 4: Review (human + AI)

  1. Developer reviews merged PRs
  2. Manual testing on staging environment
  3. Final quality check before production deployment
  4. Feature flag rollout if appropriate

Phase 5: Retrospective (human)

  • What went well? What didn't?
  • Were issue specifications clear enough?
  • Did agents produce production-quality code?
  • What patterns should we encode into skills?
  • What quality problems should we add to review checklist?

This retrospective feedback becomes the input for improving skills, specifications, and workflows for the next feature.


Tools and resources from the workshop

Core repositories

Key skills from Matt Pocock

SkillPurpose
/grill-meForce alignment through deep questioning
/to-prdConvert conversations into structured PRD
/prd-to-issuesDecompose PRD into vertical-slice issues
/tddGuide test-driven development workflow
/improve-codebase-architectureIdentify shallow modules and suggest deep alternatives

Educational resources

Foundational reading

  • A Philosophy of Software Design by John Ousterhout: Deep vs shallow modules, complexity management
  • The Pragmatic Programmer: Tracer bullets, vertical slicing, quality-driven development

Critical skills for AI-augmented engineering

The workshop ultimately argues these traditional engineering skills become more important, not less, when working with AI:

1. Requirements clarity

AI agents are terrible at operating with ambiguous requirements. The better you can articulate constraints, edge cases, and success criteria, the more useful the agent becomes.

Skill to develop: Practice writing precise acceptance criteria and failing tests that capture requirements unambiguously.

2. Architectural judgment

Agents implement solutions within the architecture you provide. They won't spontaneously refactor to a better design pattern or question a flawed system boundary.

Skill to develop: Study design patterns, system design, and module boundaries. Make architectural decisions deliberately before delegating implementation.

3. Test design

Tests define "correct" for autonomous agents. Poor tests mean agents optimize for passing bad tests rather than solving real problems.

Skill to develop: Master TDD, property-based testing, integration testing, and test design. Learn to write tests that actually validate desired behavior.

4. Code review

Agent-generated code requires scrutiny for security, performance, edge cases, and alignment with project standards. Rubber-stamping merges creates technical debt at machine speed.

Skill to develop: Systematic code review, security awareness, performance analysis, and architectural consistency checking.

5. Task decomposition

Keeping work in the smart zone requires breaking complex problems into smaller, well-scoped phases. This is the core skill that enables effective agent work.

Skill to develop: Practice vertical slicing, tracer bullet development, and incremental delivery. Learn to see the smallest complete slice that delivers value.


Common mistakes teams make

The workshop identifies several anti-patterns:

1. Treating AI as a senior developer

Mistake: "The AI will figure out the architecture."

Reality: Agents implement within constraints you provide. Architecture is your job.

2. Skipping the alignment phase

Mistake: Immediately start coding from vague requirements.

Reality: 15 minutes of grill-me saves hours of rework.

3. Working in the dumb zone

Mistake: Pasting entire codebase, all conversation history, every GitHub issue into context.

Reality: Bloated context degrades reasoning quality. Stay lean.

4. Skipping tests

Mistake: "The agent can write tests later."

Reality: Tests are the specification that makes autonomous work possible. Write them first.

5. Blind trust in agent output

Mistake: Merging agent PRs without review.

Reality: Agents produce plausible-looking code that may be wrong, insecure, or inefficient.

6. Not classifying work appropriately

Mistake: Treating all work as AFK-able.

Reality: Judgment-heavy decisions require human involvement. Only mechanical implementation should be delegated to autonomous agents.


How to apply this in your workflow

Week 1: Learn the patterns

  1. Watch the full workshop video
  2. Read through Matt Pocock's skills repository
  3. Try the grill-me skill on a real feature request
  4. Practice writing PRDs with to-prd
  5. Decompose one PRD into vertical slices

Week 2: Adopt TDD

  1. Start writing failing tests before asking agents for implementation
  2. Give agents one failing test at a time
  3. Review implementations for passing tests while maintaining code quality
  4. Build comfort with the Red-Green-Refactor cycle

Week 3: Experiment with AFK classification

  1. Look at your backlog and classify issues as HITL or AFK
  2. Try letting an agent work autonomously on one AFK issue
  3. Review the output critically
  4. Iterate on how you write specifications to improve autonomous results

Week 4: Introduce review rigor

  1. Stop rubber-stamping agent PRs
  2. Use Opus for review of Sonnet implementations
  3. Maintain explicit quality checklists
  4. Track common problems and add them to review criteria

Month 2-3: Scale with Sand Castle

  1. Identify opportunities for parallel agent work
  2. Set up Sand Castle for worktree isolation
  3. Run 2-3 agents in parallel on well-specified issues
  4. Monitor results and iterate on specifications

Who this workshop is for

The workshop explicitly targets professional software engineers who:

  • Already understand software fundamentals (architecture, testing, design patterns)
  • Are skeptical of "AI will replace all developers" hype
  • Want practical patterns for production use
  • Value code quality, maintainability, and engineering discipline
  • Are willing to invest in learning how to work effectively with AI rather than fighting it

This is not a workshop for:

  • Non-technical people hoping to become developers through AI
  • Developers looking for shortcuts to avoid learning fundamentals
  • Teams that don't already have testing, code review, and quality standards
  • Anyone expecting "fully autonomous" development with zero human involvement

The target audience is experienced engineers who want to 10x their leverage while maintaining production quality standards.


The meta-lesson: AI amplifies your system

The deepest insight from the workshop is this: AI agents amplify whatever system you give them.

If your system has:

  • Clear requirements → Agents produce aligned implementations
  • Comprehensive tests → Agents validate their work reliably
  • Deep modules → Agents integrate cleanly
  • Quality standards → Agents meet your bar
  • Review rigor → Problems get caught before production

But if your system has:

  • Vague requirements → Agents guess wrong
  • Missing tests → Agents ship bugs
  • Shallow modules → Agents create brittle coupling
  • No standards → Agents produce inconsistent code
  • No review → Technical debt compounds at machine speed

AI does not fix broken engineering processes. It accelerates whatever process you have—for better or worse.

Pocock's workshop is fundamentally about building systems worthy of amplification.


What's next: Claude Code for Real Engineers

For engineers who want to go deeper, Matt Pocock offers Claude Code for Real Engineers, a 2-week cohort covering:

  • AI coding from first principles
  • Complete workflow implementation
  • Production patterns and anti-patterns
  • Real codebase examples
  • Live skill development
  • Community of serious practitioners

The cohort is designed for working engineers who want production-ready skills, not toy demos.


Related reading

The concepts in this workshop connect to broader patterns in the AI agent ecosystem:


Bottom line

Matt Pocock's "AI Coding for Real Engineers" workshop makes a definitive argument: software engineering fundamentals are not obsolete in the age of AI—they're the leverage layer that determines whether AI agents amplify your capabilities or multiply your problems.

The workshop provides practical, production-ready patterns for:

  • Keeping work in the smart zone through task decomposition
  • Forcing alignment with grill-me before coding begins
  • Structuring requirements as PRDs and vertical-slice issues
  • Writing tests first to enable autonomous implementation
  • Classifying work as HITL vs AFK to parallelize safely
  • Using Sand Castle to orchestrate parallel sandboxed agents
  • Reviewing code rigorously with fresh context and explicit standards
  • Building deep modules that agents can work with effectively

These are not theoretical concepts. They're workflows that professional engineering teams are using in production to ship features 3-5x faster while maintaining quality standards.

The key insight: your job as an engineer is not to write every line of code yourself. It's to design systems, define specifications, encode quality standards, and maintain architectural integrity—then delegate mechanical implementation to agents that work within those constraints.

Master those skills, and AI becomes genuine leverage. Skip them, and AI becomes a chaos multiplier.

Workshop note: This comprehensive breakdown is based on Matt Pocock's "AI Coding for Real Engineers" full workshop video released April 24, 2026, plus analysis of his open-source skills repository, Sand Castle framework, community discussions, and related educational content from AI Hero. For latest updates, skill releases, and cohort information, visit aihero.dev and the mattpocock/skills GitHub repository.

Sources

Related posts