Matt Pocock's AI Coding Workshop: Complete Guide for Real Engineers | explainx.ai Blog

explainx.ainewsletter3.5k

Matt Pocock's AI Coding Workshop: Complete Guide for Real Engineers | explainx.ai Blog | explainx.ai

separate branches using tools like Sand Castle while developers focus on high-value planning and architecture.

question: What is Sand Castle and how does it enable parallelization? answer: >- Sand Castle is Matt Pocock's TypeScript framework for orchestrating multiple sandboxed coding agents in parallel. It creates git worktrees, runs agents in Docker containers on separate branches, and coordinates planner, implementation, reviewer, and merger agents. This enables production teams to parallelize AFK work safely with proper isolation and review pipelines. seoTitle: "Matt Pocock's AI Coding Workshop: Smart Zones, TDD & Autonomous Agents" seoDescription: >- Matt Pocock's full AI coding workshop covers practical engineering workflows: smart zone vs dumb zone, grill-me alignment, PRD-to-vertical-slices, TDD with AI, AFK agents, Sand Castle parallelization, and production code review.

On April 24, 2026, TypeScript educator and AI coding pioneer Matt Pocock released a comprehensive 2-hour workshop titled "AI Coding for Real Engineers" that fundamentally challenges the notion that AI makes software engineering fundamentals obsolete.

While many treat AI coding as a paradigm shift that renders 20 years of software engineering wisdom irrelevant, Pocock makes the opposite argument: engineering fundamentals matter more than ever when working with AI agents. The workshop demonstrates that code quality, architecture decisions, TDD discipline, and design patterns become critical leverage points that determine whether AI agents amplify your capabilities or multiply technical debt.

This article provides a comprehensive breakdown of the workshop's core concepts, practical workflows, and production-ready patterns for integrating AI agents into professional software development.

TL;DR

Topic	Key insight
Workshop focus	Complete AI-assisted development lifecycle from ambiguous requirements to autonomous agent deployment
Core thesis	Software engineering fundamentals are not obsolete—they're the leverage layer that makes AI agents useful
Smart vs dumb zone	LLMs degrade beyond ~100k tokens (40% context); keep work in the smart zone through task decomposition
Grill-me skill	Force agents to ask 40-80 alignment questions before coding to reach shared understanding
PRD workflow	Synthesize conversations into structured PRDs, then decompose into vertical-slice GitHub issues
Tracer bullets	Build thin vertical slices through all layers (schema/API/UI/tests), not horizontal layers
TDD with AI	Write failing tests first, let agents implement to green, review and refactor
AFK agents	Classify issues as HITL (human decision) or AFK (autonomous implementation)
Sand Castle	TypeScript framework for parallelizing sandboxed agents with worktrees and Docker isolation
Code review	Use Sonnet for speed, Opus for review quality, always in fresh context
Deep modules	Favor high-leverage interfaces that hide complexity over shallow modules that leak it

The fundamental problem: everyone thinks AI changes everything

The dominant narrative in 2026 is that AI coding assistants represent a paradigm shift so profound that traditional software engineering practices—design patterns, TDD, architecture reviews, code review processes—are suddenly obsolete boomer baggage.

Matt Pocock's workshop opens by rejecting this framing entirely.

His counter-thesis: AI agents are powerful, but they're only as good as the engineering constraints, workflows, and quality standards you give them. If you feed an agent vague requirements, skip tests, ignore architecture, and treat code as disposable, the agent will amplify those bad habits at machine speed.

The workshop demonstrates the opposite approach: treating AI as a force multiplier for disciplined engineering. When you combine strong fundamentals with agent capabilities, you get compound leverage. When you abandon fundamentals, you get compound dysfunction.

This is why Pocock built Claude Code for Real Engineers, a 2-week cohort teaching AI coding from first principles, and open-sourced his skills repository (135,000+ GitHub stars as of June 2026) containing agent skills that encode professional engineering workflows—now at v1.0.1 with progressive disclosure. See our full skills guide and v1.0 breakdown.

Smart zone vs dumb zone: the context window cliff

One of the workshop's most actionable insights is the concept of the smart zone and dumb zone in LLM context windows.

The threshold

At approximately 100,000 tokens—or roughly 40% of the total context window—LLMs begin entering what Pocock calls the "dumb zone" where reasoning quality degrades sharply. The exact boundary varies by model and task complexity, but everyone working with long-context models has observed this cliff.

The cause is quadratic attention scaling: as context grows, the computational cost of attending to all tokens increases exponentially, leading to quality degradation even when the model technically has capacity for more tokens.

Why this matters for AI coding

Most developers instinctively believe: more context = better results. Just paste the entire codebase, all previous conversations, every GitHub issue, and let the model figure it out.

Pocock's data shows the opposite: bloated context actively harms output quality. Once you cross into the dumb zone, the agent starts missing critical details, hallucinating solutions that contradict earlier context, and producing generic code that ignores project-specific constraints.

The practical strategy

The workshop emphasizes a multi-phase decomposition approach:

Phase	Goal	Context size
Alignment	Understand requirements deeply through grill-me questioning	Minimal: just the initial request
Planning	Produce PRD and break into vertical slices	Medium: PRD + architecture docs
Implementation	Each agent works on one isolated issue	Small: issue description + relevant code only
Review	Fresh context evaluation of changes	Clean: just the diff + quality standards

By keeping each phase within the smart zone, you maintain reasoning quality throughout the development lifecycle.

This is not theoretical optimization. Pocock demonstrates real examples where the same task produces dramatically different results depending on whether context is kept lean or allowed to bloat.

The grill-me skill: forcing alignment before code

The second major pattern from the workshop is what Pocock calls the "grill-me" skill—an alignment phase that happens before any code is written.

The problem it solves

The default workflow with AI agents is:

Describe what you want
Agent starts coding immediately
Discover 20 minutes later the agent misunderstood half your requirements
Throw away the work and start over

This is expensive, demoralizing, and creates an adversarial relationship with the tool.

How grill-me works

The grill-me skill flips this dynamic. Instead of letting the agent immediately implement, you force it to interrogate you first:

text

Before writing any code, ask me at least 40 questions to ensure you
fully understand the requirements, edge cases, architecture constraints,
performance requirements, testing strategy, and integration points.

Do not stop asking questions until you can articulate the complete
solution back to me and I confirm we have shared understanding.

What happens in practice

Pocock's examples show agents asking 40 to 80 targeted questions covering:

Requirements clarity: "You mentioned user roles. Which roles? What can each role do?"
Edge cases: "What happens if two users update the same record simultaneously?"
Architecture constraints: "Should this be a new service or extend the existing API?"
Performance requirements: "What's the expected request volume? What latency is acceptable?"
Testing strategy: "Do you want integration tests, unit tests, or both?"
Integration points: "Which external services does this touch? Are there rate limits?"
Data model: "Does the schema already exist or are we creating it?"
Error handling: "What should happen on failure? Retry? Alert? Rollback?"

This process produces what Pocock calls a shared design concept: you and the agent are genuinely on the same wavelength about what's being built.

Why this saves time

Investing 10-15 minutes in alignment questions prevents hours of wasted autonomous work. More importantly, it surfaces misunderstandings, missing requirements, and architectural conflicts before they become code problems.

Pocock documents this pattern in his grill-me skill along with common mistakes teams make when using it.

PRD creation: turning conversation into structure

Once alignment is complete, the workshop demonstrates converting that shared understanding into a Product Requirements Document (PRD) that becomes the source of truth.

The to-prd skill

Pocock's to-prd skill synthesizes conversations and grill-me sessions into structured PRDs that typically include:

PRD section	Purpose
Problem statement	What user/business need are we solving?
Proposed solution	High-level approach and architecture
Success criteria	Measurable outcomes that indicate done
User stories	Specific scenarios and use cases
Technical constraints	Performance, security, compatibility requirements
Out of scope	Explicitly document what we're NOT building
Dependencies	Other systems, teams, or features this requires
Testing strategy	How will we validate this works?
Rollout plan	Feature flags, gradual rollout, rollback strategy

Why PRDs matter for AI agents

A PRD serves as the destination document that agents can reference throughout implementation. When an agent gets confused or needs to make a design decision, it can return to the PRD for authoritative guidance.

This prevents scope creep, ensures consistency across multiple agents working in parallel, and provides a clear definition of "done" that isn't subject to interpretation.

Filing as GitHub issues

Pocock's workflow has the PRD automatically filed as a GitHub issue. This creates:

Single source of truth for the entire feature
Discussion thread where team members can comment, question, or suggest changes
Audit trail showing how requirements evolved
Link target that implementation PRs can reference

This is not bureaucracy—it's infrastructure that makes parallel autonomous work possible.

Vertical slices: tracer bullets through the stack

With a PRD in place, the next phase is decomposition. This is where Pocock's approach diverges sharply from typical project planning.

Horizontal layering (what most teams do)

The default instinct is to break work into horizontal layers:

Design database schema for all entities
Build all API endpoints
Create all UI components
Write tests at the end

This creates long-running branches, late integration, and makes it impossible to validate whether the feature works until everything is done.

Vertical slicing (Pocock's approach)

Instead, Pocock advocates tracer bullets—thin vertical slices that cut through every layer to deliver a complete, working feature:

text

Issue 1: User can create a basic todo
- Schema: Add `todos` table with id, text, completed, user_id
- API: POST /todos endpoint with validation
- UI: Simple form to create todo
- Tests: End-to-end test covering create flow

Issue 2: User can mark todo as complete
- Schema: Use existing completed field
- API: PATCH /todos/:id endpoint
- UI: Checkbox component
- Tests: End-to-end test covering complete flow

Issue 3: User can delete todo
- Schema: No changes needed
- API: DELETE /todos/:id endpoint
- UI: Delete button with confirmation
- Tests: End-to-end test covering delete flow

Each issue is independently shippable. Each can be picked up by a different agent. Each can be tested and validated immediately.

The to-issues skill

Pocock's to-issues skill takes a PRD and automatically decomposes it into vertical-slice GitHub issues with:

Clear scope: Exactly what this slice includes
Acceptance criteria: What "done" looks like
Testing requirements: How to validate
Dependencies: Which other issues must complete first
HITL vs AFK classification: Does this need human judgment or can an agent handle it autonomously?

This decomposition is not busywork—it's the infrastructure that enables parallelization.

TDD with AI agents: write tests first, let agents go green

One of the workshop's most controversial claims is that Test-Driven Development (TDD) becomes more important with AI, not less.

Why developers skip TDD with AI

The common argument: "AI can write both code and tests together, why slow down with TDD?"

Pocock's response: because AI agents are optimizers. Give them a vague problem and they'll produce code that superficially works but doesn't actually solve your requirements. Give them a failing test and they'll produce code that provably solves the problem you specified.

The TDD workflow with agents

text

1. Red: Human writes a failing test describing desired behavior
2. Green: Agent writes minimum code to make test pass
3. Refactor: Agent or human cleans up implementation
4. Repeat: Move to next test

The key insight: tests are specifications that agents can verify against. When you write the test first, you're forcing yourself to think clearly about:

What inputs should produce what outputs?
What edge cases must be handled?
What error conditions should be caught?
What side effects are acceptable?

Example from the workshop

Instead of:

text

"Build a function that validates email addresses"

Write a failing test:

typescript

describe('validateEmail', () => {
  it('accepts valid emails', () => {
    expect(validateEmail('[email protected]')).toBe(true);
  });

  it('rejects emails without @', () => {
    expect(validateEmail('userexample.com')).toBe(false);
  });

  it('rejects emails without domain', () => {
    expect(validateEmail('user@')).toBe(false);
  });

  it('accepts emails with subdomains', () => {
    expect(validateEmail('[email protected]')).toBe(true);
  });
});

Then tell the agent: "Make these tests pass."

The agent now has an unambiguous success criterion. No guessing about requirements. No debating whether the implementation is "correct." The tests define correct.

Why this works with autonomous agents

When you send an agent off to work AFK (away from keyboard), you can't supervise every decision. But you can give it a test suite that defines acceptable behavior.

Pocock shows examples where:

Agent tries 3 different implementation approaches before finding one that passes all tests
Agent discovers edge cases you didn't specify and adds defensive checks
Agent refactors implementation while keeping tests green

This is specification-driven autonomous work. The tests are your quality enforcement layer when you're not watching.

Deep modules vs shallow modules: hiding complexity

The workshop spends significant time on software architecture principles, particularly the concept of deep modules from John Ousterhout's A Philosophy of Software Design.

The core distinction

Deep modules	Shallow modules
Small, simple interface	Complex, intricate interface
Hide significant complexity	Expose implementation details
High leverage: lots of functionality behind minimal API	Low leverage: interface nearly as complex as implementation
Example: `fs.readFile(path)` hides buffering, streaming, error handling	Example: `new XMLHttpRequest()` with 12 setup methods before use

Why this matters for AI agents

Pocock argues that AI agents write better code when working with deep modules, because:

Less context required: Agent only needs to understand the interface, not the internals
Fewer integration points: Less surface area for bugs and misunderstandings
Better encapsulation: Changes don't cascade across the codebase
Clearer contracts: Deep modules have obvious input/output semantics

Conversely, shallow modules force agents to hold more implementation details in context, increasing the chance of crossing into the dumb zone.

Practical application

When reviewing agent-generated code, Pocock recommends asking:

Does this module expose unnecessary complexity?
Could the interface be simpler while hiding the same functionality?
Are we leaking implementation details that will create coupling?
Is the cognitive load at the call site higher than it needs to be?

The workshop includes his improve-codebase-architecture skill that agents can use to identify shallow modules and propose deeper alternatives.

AFK agents: away from keyboard implementation

One of the workshop's most production-ready patterns is the AFK (away from keyboard) classification for autonomous agent work.

HITL vs AFK

When decomposing PRDs into issues, each issue gets classified:

Classification	Meaning	Example
HITL	Human in the loop: requires judgment, architecture decisions, or business tradeoffs	"Should we cache this data or recompute on every request?"
AFK	Away from keyboard: well-defined implementation with clear acceptance criteria	"Add validation to email field per PRD specification"

What makes good AFK work

Issues are AFK-ready when they have:

Clear specification: Failing test or detailed acceptance criteria
No architectural decisions: The design is already determined
Isolated scope: Changes are contained to one module or feature
Obvious validation: Tests or demo steps that prove it works
Low risk: Failure doesn't compromise security, data integrity, or production stability

The AFK workflow

Planner agent examines backlog and identifies non-blocked AFK issues
Implementation agents pick issues in parallel, each in isolated environment
Agents commit when tests pass and acceptance criteria are met
Reviewer agents check output against quality standards
Merger agent integrates approved changes

Pocock demonstrates a real example where 3 agents work in parallel on separate vertical slices, implementing and testing autonomously while he's in meetings.

Why this is not "fully autonomous coding"

Critically, AFK does not mean "let the agent do whatever it wants." It means:

Human-defined scope: You decided this issue is safe for autonomous work
Human-written tests: The agent must satisfy specifications you created
Human review gates: Output is reviewed before merge
Human architecture: The design decisions are already made

The autonomous part is the mechanical implementation of well-specified work. The judgment, design, and quality standards remain human responsibilities.

Sand Castle: parallelizing autonomous agents safely

To operationalize AFK workflows at production scale, Pocock built Sand Castle, a TypeScript framework for orchestrating sandboxed coding agents.

The core problem

Running multiple agents in parallel on the same codebase creates:

Git conflicts from agents editing the same files
Test interference when agents run tests simultaneously
Resource contention for databases, ports, and file locks
Environment pollution from package installations and config changes
Security risk if agents have unbounded system access

How Sand Castle solves this

typescript

import { run } from 'sandcastle';

await run({
  issue: 'Add user profile endpoint',
  branch: 'feature/user-profile',
  worktree: true,  // Isolate in git worktree
  docker: true,    // Sandboxed execution
  timeout: 30 * 60 * 1000,  // 30 minute max
});

For each agent, Sand Castle:

Creates a git worktree: Isolated working directory on separate branch
Spins up Docker container: Sandboxed environment with resource limits
Passes issue context: Agent only sees relevant code and requirements
Monitors execution: Logs, timeout enforcement, error capture
Collects output: Committed changes, test results, agent logs
Cleans up: Removes worktree and container when done

The four agent types

Sand Castle coordinates:

Agent type	Responsibility
Planner	Examines full backlog, identifies non-blocked AFK issues, determines parallelization opportunities
Implementation	Works on single issue in isolated environment, writes code, runs tests, commits when green
Reviewer	Reviews implementation agent output against quality standards, fresh context
Merger	Integrates approved changes, resolves conflicts if needed, validates integration tests

Production considerations

The workshop emphasizes Sand Castle is not a "set it and forget it" solution. Production use requires:

Clear issue specifications: Garbage in, garbage out
Comprehensive test suites: Agents validate against your tests
Review gates: Human or Opus agent reviews before merge
Monitoring: Track success rates, common failures, quality trends
Cost awareness: Parallel agents consume API quota quickly
Rollback procedures: Be prepared to revert failed deployments

Pocock shows real examples where Sand Castle enables 3-5x faster feature delivery for well-specified work, while acknowledging it's inappropriate for exploratory or high-judgment tasks.

Code review strategies: Sonnet for speed, Opus for quality

The workshop dedicates substantial time to code review as a quality gate for agent-generated code.

The anti-pattern

The default approach many teams adopt:

Agent writes code
Glance at the diff
Merge if it looks fine
Discover problems in production

This treats code review as a formality rather than a quality enforcement mechanism.

Pocock's recommended strategy

Review aspect	Approach	Tool
Implementation	Use Sonnet for speed and cost	Claude Sonnet 4.5
Review	Use Opus for quality and judgment	Claude Opus 4.5
Context	Reviewer always works in fresh context, not implementation thread	New conversation
Standards	Explicit quality checklist provided to reviewer	Documented criteria
Focus	List issues first, not praise	Critical evaluation

Why use different models for implementation vs review

Sonnet: Fast, cost-effective, good at following specifications
Opus: Superior reasoning, catches subtle bugs, better architectural judgment

By using Sonnet for the mechanical implementation work and Opus for critical review, you optimize for both velocity and quality.

The fresh context principle

This is critical: the reviewer should not have access to the implementation conversation history.

Why? Because seeing the implementation context creates anchoring bias. The reviewer starts evaluating whether the agent solved the problem it thought it was solving, rather than whether the solution is objectively correct.

Fresh context forces the reviewer to evaluate:

Does this code do what the issue specification claims?
Are there edge cases the implementation missed?
Are there security, performance, or maintainability problems?
Does this align with project architecture and standards?

What good review output looks like

Pocock shows examples where reviewer agents:

List issues first, not praise
Provide specific examples of problems found
Suggest fixes with code snippets
Assess merge-readiness: Can this ship? What blocks it?
Highlight architectural concerns that might affect future work

Poor review: "This looks good, I don't see any issues."

Good review: "This implementation has three problems: 1) Missing null check on line 47 could throw in production. 2) No rate limiting on the endpoint. 3) Error messages leak internal implementation details. Recommend fixing before merge."

The ralph loop: bash-driven autonomous agents

A brief but powerful pattern from the workshop is what the community calls the "ralph loop"—popularized by Pocock's demonstrations of bash-driven agent automation.

The basic pattern

bash

while true; do
  claude-code --file prompt.md

  if [ $? -eq 0 ]; then
    git add .
    git commit -m "Agent work completed"
  else
    echo "Agent failed, stopping loop"
    break
  fi

  sleep 5
done

The prompt.md file tells the agent:

Read your state from disk (issue descriptions, tests, last commit message)
Pick the next AFK issue from backlog
Implement it following project standards
Run tests
Commit if green
Exit with success code

Why this works

The loop enables continuous autonomous work without human intervention. The agent:

Reads its own progress from git history
Picks work from a prioritized backlog
Self-validates through tests
Commits only when successful
Stops when it encounters ambiguity or failure

This is the operational pattern behind Pocock's demonstrations of "I started 3 agents before lunch and they shipped 8 features by end of day."

When to use the ralph loop

Good use cases:

Backlog of well-specified, test-defined issues
Stable test suite that reliably catches regressions
Low-risk changes (feature flags protect production)
Cost-insensitive context (burning API quota is acceptable)

Bad use cases:

Exploratory work requiring judgment
Codebase without comprehensive tests
Changes touching security, payments, or data integrity
Situations where mistakes are expensive to fix

The ralph loop is a production pattern for mechanical work, not a replacement for thoughtful engineering.

Practical workflow: end-to-end example

The workshop culminates in a complete walkthrough demonstrating all concepts together:

Phase 1: Alignment (10-15 minutes, human + AI)

Developer describes high-level feature request
Run /grill-me skill
Agent asks 40-60 clarifying questions
Developer answers, agent captures edge cases, constraints, dependencies
Reach shared understanding

Phase 2: Planning (15-20 minutes, human + AI)

Run /to-prd to generate structured PRD
File PRD as GitHub issue
Review and refine PRD with team
Run /prd-to-issues to decompose into vertical slices
Classify each issue as HITL or AFK
Sequence issues based on dependencies

Phase 3: Implementation (autonomous, parallel agents)

For each AFK issue:

Sand Castle creates worktree and Docker environment
Implementation agent loads issue + relevant context (stays in smart zone)
Agent writes failing test (if not already present)
Agent implements code to pass tests
Agent commits on success
Reviewer agent evaluates in fresh context
Merger agent integrates if approved

Developer focuses on HITL issues requiring architectural decisions.

Phase 4: Review (human + AI)

Developer reviews merged PRs
Manual testing on staging environment
Final quality check before production deployment
Feature flag rollout if appropriate

Phase 5: Retrospective (human)

What went well? What didn't?
Were issue specifications clear enough?
Did agents produce production-quality code?
What patterns should we encode into skills?
What quality problems should we add to review checklist?

This retrospective feedback becomes the input for improving skills, specifications, and workflows for the next feature.

Tools and resources from the workshop

Core repositories

mattpocock/skills: 29+ agent skills for professional engineering workflows
mattpocock/sandcastle: TypeScript framework for parallelizing sandboxed agents
mattpocock/dictionary-of-ai-coding: AI coding jargon explained in plain English

Key skills from Matt Pocock

Skill	Purpose
/grill-me	Force alignment through deep questioning
/to-prd	Convert conversations into structured PRD
/prd-to-issues	Decompose PRD into vertical-slice issues
/tdd	Guide test-driven development workflow
/improve-codebase-architecture	Identify shallow modules and suggest deep alternatives

Educational resources

AI Hero: Pocock's platform with workshops, courses, and skill documentation
Claude Code for Real Engineers: 2-week cohort teaching AI coding from first principles
Things People Get Wrong with /grill-me: Common mistakes and how to avoid them

Foundational reading

A Philosophy of Software Design by John Ousterhout: Deep vs shallow modules, complexity management
The Pragmatic Programmer: Tracer bullets, vertical slicing, quality-driven development

Critical skills for AI-augmented engineering

The workshop ultimately argues these traditional engineering skills become more important, not less, when working with AI:

1. Requirements clarity

AI agents are terrible at operating with ambiguous requirements. The better you can articulate constraints, edge cases, and success criteria, the more useful the agent becomes.

Skill to develop: Practice writing precise acceptance criteria and failing tests that capture requirements unambiguously.

2. Architectural judgment

Agents implement solutions within the architecture you provide. They won't spontaneously refactor to a better design pattern or question a flawed system boundary.

Skill to develop: Study design patterns, system design, and module boundaries. Make architectural decisions deliberately before delegating implementation.

3. Test design

Tests define "correct" for autonomous agents. Poor tests mean agents optimize for passing bad tests rather than solving real problems.

Skill to develop: Master TDD, property-based testing, integration testing, and test design. Learn to write tests that actually validate desired behavior.

4. Code review

Agent-generated code requires scrutiny for security, performance, edge cases, and alignment with project standards. Rubber-stamping merges creates technical debt at machine speed.

Skill to develop: Systematic code review, security awareness, performance analysis, and architectural consistency checking.

5. Task decomposition

Keeping work in the smart zone requires breaking complex problems into smaller, well-scoped phases. This is the core skill that enables effective agent work.

Skill to develop: Practice vertical slicing, tracer bullet development, and incremental delivery. Learn to see the smallest complete slice that delivers value.

Common mistakes teams make

The workshop identifies several anti-patterns:

1. Treating AI as a senior developer

Mistake: "The AI will figure out the architecture."

Reality: Agents implement within constraints you provide. Architecture is your job.

2. Skipping the alignment phase

Mistake: Immediately start coding from vague requirements.

Reality: 15 minutes of grill-me saves hours of rework.

3. Working in the dumb zone

Mistake: Pasting entire codebase, all conversation history, every GitHub issue into context.

Reality: Bloated context degrades reasoning quality. Stay lean.

4. Skipping tests

Mistake: "The agent can write tests later."

Reality: Tests are the specification that makes autonomous work possible. Write them first.

Mistake: Merging agent PRs without review.

Reality: Agents produce plausible-looking code that may be wrong, insecure, or inefficient.

6. Not classifying work appropriately

Mistake: Treating all work as AFK-able.

Reality: Judgment-heavy decisions require human involvement. Only mechanical implementation should be delegated to autonomous agents.

How to apply this in your workflow

Week 1: Learn the patterns

Watch the full workshop video
Read through Matt Pocock's skills repository
Try the grill-me skill on a real feature request
Practice writing PRDs with to-prd
Decompose one PRD into vertical slices

Week 2: Adopt TDD

Start writing failing tests before asking agents for implementation
Give agents one failing test at a time
Review implementations for passing tests while maintaining code quality
Build comfort with the Red-Green-Refactor cycle

Week 3: Experiment with AFK classification

Look at your backlog and classify issues as HITL or AFK
Try letting an agent work autonomously on one AFK issue
Review the output critically
Iterate on how you write specifications to improve autonomous results

Week 4: Introduce review rigor

Stop rubber-stamping agent PRs
Use Opus for review of Sonnet implementations
Maintain explicit quality checklists
Track common problems and add them to review criteria

Month 2-3: Scale with Sand Castle

Identify opportunities for parallel agent work
Set up Sand Castle for worktree isolation
Run 2-3 agents in parallel on well-specified issues
Monitor results and iterate on specifications

Who this workshop is for

The workshop explicitly targets professional software engineers who:

Already understand software fundamentals (architecture, testing, design patterns)
Are skeptical of "AI will replace all developers" hype
Want practical patterns for production use
Value code quality, maintainability, and engineering discipline
Are willing to invest in learning how to work effectively with AI rather than fighting it

This is not a workshop for:

Non-technical people hoping to become developers through AI
Developers looking for shortcuts to avoid learning fundamentals
Teams that don't already have testing, code review, and quality standards
Anyone expecting "fully autonomous" development with zero human involvement

The target audience is experienced engineers who want to 10x their leverage while maintaining production quality standards.

The meta-lesson: AI amplifies your system

The deepest insight from the workshop is this: AI agents amplify whatever system you give them.

If your system has:

Clear requirements → Agents produce aligned implementations
Comprehensive tests → Agents validate their work reliably
Deep modules → Agents integrate cleanly
Quality standards → Agents meet your bar
Review rigor → Problems get caught before production

But if your system has:

Vague requirements → Agents guess wrong
Missing tests → Agents ship bugs
Shallow modules → Agents create brittle coupling
No standards → Agents produce inconsistent code
No review → Technical debt compounds at machine speed

AI does not fix broken engineering processes. It accelerates whatever process you have—for better or worse.

Pocock's workshop is fundamentally about building systems worthy of amplification.

What's next: Claude Code for Real Engineers

For engineers who want to go deeper, Matt Pocock offers Claude Code for Real Engineers, a 2-week cohort covering:

AI coding from first principles
Complete workflow implementation
Production patterns and anti-patterns
Real codebase examples
Live skill development
Community of serious practitioners

The cohort is designed for working engineers who want production-ready skills, not toy demos.

The concepts in this workshop connect to broader patterns in the AI agent ecosystem:

Hiten Shah's AI Skill Library Strategy: Turning top performers' workflows into reusable agent skills
Matt Pocock's Agent Skills for Real Engineers: Deep dive into the skills repository
Agent Skills Security: Threat modeling for agent extensions
What are Agent Skills?: Complete guide to the agent skills ecosystem

Bottom line

Matt Pocock's "AI Coding for Real Engineers" workshop makes a definitive argument: software engineering fundamentals are not obsolete in the age of AI—they're the leverage layer that determines whether AI agents amplify your capabilities or multiply your problems.

The workshop provides practical, production-ready patterns for:

Keeping work in the smart zone through task decomposition
Forcing alignment with grill-me before coding begins
Structuring requirements as PRDs and vertical-slice issues
Writing tests first to enable autonomous implementation
Classifying work as HITL vs AFK to parallelize safely
Using Sand Castle to orchestrate parallel sandboxed agents
Reviewing code rigorously with fresh context and explicit standards
Building deep modules that agents can work with effectively

These are not theoretical concepts. They're workflows that professional engineering teams are using in production to ship features 3-5x faster while maintaining quality standards.

The key insight: your job as an engineer is not to write every line of code yourself. It's to design systems, define specifications, encode quality standards, and maintain architectural integrity—then delegate mechanical implementation to agents that work within those constraints.

Master those skills, and AI becomes genuine leverage. Skip them, and AI becomes a chaos multiplier.

Workshop note: This comprehensive breakdown is based on Matt Pocock's "AI Coding for Real Engineers" full workshop video released April 24, 2026, plus analysis of his open-source skills repository, Sand Castle framework, community discussions, and related educational content from AI Hero. For latest updates, skill releases, and cohort information, visit aihero.dev and the mattpocock/skills GitHub repository.

Related posts

Y Combinator Launches Paxel: AI Coding Habits Profiler for Builder Reports and Startup School Applications

The AI Coding Plugins Ecosystem: 185+ Productivity Tools from OpenAI, Anthropic, and Beyond on explainx.ai

OpenAI Releases Official Codex Plugins Repository: Figma, Notion, iOS, Web Apps, and MCP Server Integration

TL;DR

The fundamental problem: everyone thinks AI changes everything

Smart zone vs dumb zone: the context window cliff

The threshold

Why this matters for AI coding

The practical strategy

The grill-me skill: forcing alignment before code

The problem it solves

How grill-me works

What happens in practice

Why this saves time

PRD creation: turning conversation into structure

The to-prd skill

Why PRDs matter for AI agents

Filing as GitHub issues

Vertical slices: tracer bullets through the stack

Horizontal layering (what most teams do)

Vertical slicing (Pocock's approach)

The to-issues skill

TDD with AI agents: write tests first, let agents go green

Why developers skip TDD with AI

The TDD workflow with agents

Example from the workshop

Why this works with autonomous agents

Deep modules vs shallow modules: hiding complexity

The core distinction

Why this matters for AI agents

Practical application

AFK agents: away from keyboard implementation

HITL vs AFK

What makes good AFK work

The AFK workflow

Why this is not "fully autonomous coding"

Sand Castle: parallelizing autonomous agents safely

The core problem

How Sand Castle solves this

The four agent types

Production considerations

Code review strategies: Sonnet for speed, Opus for quality

The anti-pattern

Pocock's recommended strategy

Why use different models for implementation vs review

The fresh context principle

What good review output looks like

The ralph loop: bash-driven autonomous agents

The basic pattern

Why this works

When to use the ralph loop

Practical workflow: end-to-end example

Phase 1: Alignment (10-15 minutes, human + AI)

Phase 2: Planning (15-20 minutes, human + AI)

Phase 3: Implementation (autonomous, parallel agents)

Phase 4: Review (human + AI)

Phase 5: Retrospective (human)

Tools and resources from the workshop

Core repositories

Key skills from Matt Pocock

Educational resources

Foundational reading

Critical skills for AI-augmented engineering

1. Requirements clarity

2. Architectural judgment

3. Test design

4. Code review

5. Task decomposition

Common mistakes teams make

1. Treating AI as a senior developer

2. Skipping the alignment phase

3. Working in the dumb zone

4. Skipping tests

5. Blind trust in agent output

6. Not classifying work appropriately

How to apply this in your workflow

Week 1: Learn the patterns

Week 2: Adopt TDD

Week 3: Experiment with AFK classification