separate branches using tools like Sand Castle while developers focus on high-value planning and architecture.
- question: What is Sand Castle and how does it enable parallelization? answer: >- Sand Castle is Matt Pocock's TypeScript framework for orchestrating multiple sandboxed coding agents in parallel. It creates git worktrees, runs agents in Docker containers on separate branches, and coordinates planner, implementation, reviewer, and merger agents. This enables production teams to parallelize AFK work safely with proper isolation and review pipelines. seoTitle: "Matt Pocock's AI Coding Workshop: Smart Zones, TDD & Autonomous Agents" seoDescription: >- Matt Pocock's full AI coding workshop covers practical engineering workflows: smart zone vs dumb zone, grill-me alignment, PRD-to-vertical-slices, TDD with AI, AFK agents, Sand Castle parallelization, and production code review.
On April 24, 2026, TypeScript educator and AI coding pioneer Matt Pocock released a comprehensive 2-hour workshop titled "AI Coding for Real Engineers" that fundamentally challenges the notion that AI makes software engineering fundamentals obsolete.
While many treat AI coding as a paradigm shift that renders 20 years of software engineering wisdom irrelevant, Pocock makes the opposite argument: engineering fundamentals matter more than ever when working with AI agents. The workshop demonstrates that code quality, architecture decisions, TDD discipline, and design patterns become critical leverage points that determine whether AI agents amplify your capabilities or multiply technical debt.
This article provides a comprehensive breakdown of the workshop's core concepts, practical workflows, and production-ready patterns for integrating AI agents into professional software development.
TL;DR
| Topic | Key insight |
|---|---|
| Workshop focus | Complete AI-assisted development lifecycle from ambiguous requirements to autonomous agent deployment |
| Core thesis | Software engineering fundamentals are not obsolete—they're the leverage layer that makes AI agents useful |
| Smart vs dumb zone | LLMs degrade beyond ~100k tokens (40% context); keep work in the smart zone through task decomposition |
| Grill-me skill | Force agents to ask 40-80 alignment questions before coding to reach shared understanding |
| PRD workflow | Synthesize conversations into structured PRDs, then decompose into vertical-slice GitHub issues |
| Tracer bullets | Build thin vertical slices through all layers (schema/API/UI/tests), not horizontal layers |
| TDD with AI | Write failing tests first, let agents implement to green, review and refactor |
| AFK agents | Classify issues as HITL (human decision) or AFK (autonomous implementation) |
| Sand Castle | TypeScript framework for parallelizing sandboxed agents with worktrees and Docker isolation |
| Code review | Use Sonnet for speed, Opus for review quality, always in fresh context |
| Deep modules | Favor high-leverage interfaces that hide complexity over shallow modules that leak it |
The fundamental problem: everyone thinks AI changes everything
The dominant narrative in 2026 is that AI coding assistants represent a paradigm shift so profound that traditional software engineering practices—design patterns, TDD, architecture reviews, code review processes—are suddenly obsolete boomer baggage.
Matt Pocock's workshop opens by rejecting this framing entirely.
His counter-thesis: AI agents are powerful, but they're only as good as the engineering constraints, workflows, and quality standards you give them. If you feed an agent vague requirements, skip tests, ignore architecture, and treat code as disposable, the agent will amplify those bad habits at machine speed.
The workshop demonstrates the opposite approach: treating AI as a force multiplier for disciplined engineering. When you combine strong fundamentals with agent capabilities, you get compound leverage. When you abandon fundamentals, you get compound dysfunction.
This is why Pocock built Claude Code for Real Engineers, a 2-week cohort teaching AI coding from first principles, and open-sourced his skills repository containing 29+ agent skills that encode professional engineering workflows.
Smart zone vs dumb zone: the context window cliff
One of the workshop's most actionable insights is the concept of the smart zone and dumb zone in LLM context windows.
The threshold
At approximately 100,000 tokens—or roughly 40% of the total context window—LLMs begin entering what Pocock calls the "dumb zone" where reasoning quality degrades sharply. The exact boundary varies by model and task complexity, but everyone working with long-context models has observed this cliff.
The cause is quadratic attention scaling: as context grows, the computational cost of attending to all tokens increases exponentially, leading to quality degradation even when the model technically has capacity for more tokens.
Why this matters for AI coding
Most developers instinctively believe: more context = better results. Just paste the entire codebase, all previous conversations, every GitHub issue, and let the model figure it out.
Pocock's data shows the opposite: bloated context actively harms output quality. Once you cross into the dumb zone, the agent starts missing critical details, hallucinating solutions that contradict earlier context, and producing generic code that ignores project-specific constraints.
The practical strategy
The workshop emphasizes a multi-phase decomposition approach:
| Phase | Goal | Context size |
|---|---|---|
| Alignment | Understand requirements deeply through grill-me questioning | Minimal: just the initial request |
| Planning | Produce PRD and break into vertical slices | Medium: PRD + architecture docs |
| Implementation | Each agent works on one isolated issue | Small: issue description + relevant code only |
| Review | Fresh context evaluation of changes | Clean: just the diff + quality standards |
By keeping each phase within the smart zone, you maintain reasoning quality throughout the development lifecycle.
This is not theoretical optimization. Pocock demonstrates real examples where the same task produces dramatically different results depending on whether context is kept lean or allowed to bloat.
The grill-me skill: forcing alignment before code
The second major pattern from the workshop is what Pocock calls the "grill-me" skill—an alignment phase that happens before any code is written.
The problem it solves
The default workflow with AI agents is:
- Describe what you want
- Agent starts coding immediately
- Discover 20 minutes later the agent misunderstood half your requirements
- Throw away the work and start over
This is expensive, demoralizing, and creates an adversarial relationship with the tool.
How grill-me works
The grill-me skill flips this dynamic. Instead of letting the agent immediately implement, you force it to interrogate you first:
Before writing any code, ask me at least 40 questions to ensure you
fully understand the requirements, edge cases, architecture constraints,
performance requirements, testing strategy, and integration points.
Do not stop asking questions until you can articulate the complete
solution back to me and I confirm we have shared understanding.
What happens in practice
Pocock's examples show agents asking 40 to 80 targeted questions covering:
- Requirements clarity: "You mentioned user roles. Which roles? What can each role do?"
- Edge cases: "What happens if two users update the same record simultaneously?"
- Architecture constraints: "Should this be a new service or extend the existing API?"
- Performance requirements: "What's the expected request volume? What latency is acceptable?"
- Testing strategy: "Do you want integration tests, unit tests, or both?"
- Integration points: "Which external services does this touch? Are there rate limits?"
- Data model: "Does the schema already exist or are we creating it?"
- Error handling: "What should happen on failure? Retry? Alert? Rollback?"
This process produces what Pocock calls a shared design concept: you and the agent are genuinely on the same wavelength about what's being built.
Why this saves time
Investing 10-15 minutes in alignment questions prevents hours of wasted autonomous work. More importantly, it surfaces misunderstandings, missing requirements, and architectural conflicts before they become code problems.
Pocock documents this pattern in his grill-me skill along with common mistakes teams make when using it.
PRD creation: turning conversation into structure
Once alignment is complete, the workshop demonstrates converting that shared understanding into a Product Requirements Document (PRD) that becomes the source of truth.
The to-prd skill
Pocock's to-prd skill synthesizes conversations and grill-me sessions into structured PRDs that typically include:
| PRD section | Purpose |
|---|---|
| Problem statement | What user/business need are we solving? |
| Proposed solution | High-level approach and architecture |
| Success criteria | Measurable outcomes that indicate done |
| User stories | Specific scenarios and use cases |
| Technical constraints | Performance, security, compatibility requirements |
| Out of scope | Explicitly document what we're NOT building |
| Dependencies | Other systems, teams, or features this requires |
| Testing strategy | How will we validate this works? |
| Rollout plan | Feature flags, gradual rollout, rollback strategy |
Why PRDs matter for AI agents
A PRD serves as the destination document that agents can reference throughout implementation. When an agent gets confused or needs to make a design decision, it can return to the PRD for authoritative guidance.
This prevents scope creep, ensures consistency across multiple agents working in parallel, and provides a clear definition of "done" that isn't subject to interpretation.
Filing as GitHub issues
Pocock's workflow has the PRD automatically filed as a GitHub issue. This creates:
- Single source of truth for the entire feature
- Discussion thread where team members can comment, question, or suggest changes
- Audit trail showing how requirements evolved
- Link target that implementation PRs can reference
This is not bureaucracy—it's infrastructure that makes parallel autonomous work possible.
Vertical slices: tracer bullets through the stack
With a PRD in place, the next phase is decomposition. This is where Pocock's approach diverges sharply from typical project planning.
Horizontal layering (what most teams do)
The default instinct is to break work into horizontal layers:
- Design database schema for all entities
- Build all API endpoints
- Create all UI components
- Write tests at the end
This creates long-running branches, late integration, and makes it impossible to validate whether the feature works until everything is done.
Vertical slicing (Pocock's approach)
Instead, Pocock advocates tracer bullets—thin vertical slices that cut through every layer to deliver a complete, working feature:
Issue 1: User can create a basic todo
- Schema: Add `todos` table with id, text, completed, user_id
- API: POST /todos endpoint with validation
- UI: Simple form to create todo
- Tests: End-to-end test covering create flow
Issue 2: User can mark todo as complete
- Schema: Use existing completed field
- API: PATCH /todos/:id endpoint
- UI: Checkbox component
- Tests: End-to-end test covering complete flow
Issue 3: User can delete todo
- Schema: No changes needed
- API: DELETE /todos/:id endpoint
- UI: Delete button with confirmation
- Tests: End-to-end test covering delete flow
Each issue is independently shippable. Each can be picked up by a different agent. Each can be tested and validated immediately.
The to-issues skill
Pocock's to-issues skill takes a PRD and automatically decomposes it into vertical-slice GitHub issues with:
- Clear scope: Exactly what this slice includes
- Acceptance criteria: What "done" looks like
- Testing requirements: How to validate
- Dependencies: Which other issues must complete first
- HITL vs AFK classification: Does this need human judgment or can an agent handle it autonomously?
This decomposition is not busywork—it's the infrastructure that enables parallelization.
TDD with AI agents: write tests first, let agents go green
One of the workshop's most controversial claims is that Test-Driven Development (TDD) becomes more important with AI, not less.
Why developers skip TDD with AI
The common argument: "AI can write both code and tests together, why slow down with TDD?"
Pocock's response: because AI agents are optimizers. Give them a vague problem and they'll produce code that superficially works but doesn't actually solve your requirements. Give them a failing test and they'll produce code that provably solves the problem you specified.
The TDD workflow with agents
1. Red: Human writes a failing test describing desired behavior
2. Green: Agent writes minimum code to make test pass
3. Refactor: Agent or human cleans up implementation
4. Repeat: Move to next test
The key insight: tests are specifications that agents can verify against. When you write the test first, you're forcing yourself to think clearly about:
- What inputs should produce what outputs?
- What edge cases must be handled?
- What error conditions should be caught?
- What side effects are acceptable?
Example from the workshop
Instead of:
"Build a function that validates email addresses"
Write a failing test:
describe('validateEmail', () => {
it('accepts valid emails', () => {
expect(validateEmail('[email protected]')).toBe(true);
});
it('rejects emails without @', () => {
expect(validateEmail('userexample.com')).toBe(false);
});
it('rejects emails without domain', () => {
expect(validateEmail('user@')).toBe(false);
});
it('accepts emails with subdomains', () => {
expect(validateEmail('[email protected]')).toBe(true);
});
});
Then tell the agent: "Make these tests pass."
The agent now has an unambiguous success criterion. No guessing about requirements. No debating whether the implementation is "correct." The tests define correct.
Why this works with autonomous agents
When you send an agent off to work AFK (away from keyboard), you can't supervise every decision. But you can give it a test suite that defines acceptable behavior.
Pocock shows examples where:
- Agent tries 3 different implementation approaches before finding one that passes all tests
- Agent discovers edge cases you didn't specify and adds defensive checks
- Agent refactors implementation while keeping tests green
This is specification-driven autonomous work. The tests are your quality enforcement layer when you're not watching.
Deep modules vs shallow modules: hiding complexity
The workshop spends significant time on software architecture principles, particularly the concept of deep modules from John Ousterhout's A Philosophy of Software Design.
The core distinction
| Deep modules | Shallow modules |
|---|---|
| Small, simple interface | Complex, intricate interface |
| Hide significant complexity | Expose implementation details |
| High leverage: lots of functionality behind minimal API | Low leverage: interface nearly as complex as implementation |
Example: fs.readFile(path) hides buffering, streaming, error handling | Example: new XMLHttpRequest() with 12 setup methods before use |
Why this matters for AI agents
Pocock argues that AI agents write better code when working with deep modules, because:
- Less context required: Agent only needs to understand the interface, not the internals
- Fewer integration points: Less surface area for bugs and misunderstandings
- Better encapsulation: Changes don't cascade across the codebase
- Clearer contracts: Deep modules have obvious input/output semantics
Conversely, shallow modules force agents to hold more implementation details in context, increasing the chance of crossing into the dumb zone.
Practical application
When reviewing agent-generated code, Pocock recommends asking:
- Does this module expose unnecessary complexity?
- Could the interface be simpler while hiding the same functionality?
- Are we leaking implementation details that will create coupling?
- Is the cognitive load at the call site higher than it needs to be?
The workshop includes his improve-codebase-architecture skill that agents can use to identify shallow modules and propose deeper alternatives.
AFK agents: away from keyboard implementation
One of the workshop's most production-ready patterns is the AFK (away from keyboard) classification for autonomous agent work.
HITL vs AFK
When decomposing PRDs into issues, each issue gets classified:
| Classification | Meaning | Example |
|---|---|---|
| HITL | Human in the loop: requires judgment, architecture decisions, or business tradeoffs | "Should we cache this data or recompute on every request?" |
| AFK | Away from keyboard: well-defined implementation with clear acceptance criteria | "Add validation to email field per PRD specification" |
What makes good AFK work
Issues are AFK-ready when they have:
- Clear specification: Failing test or detailed acceptance criteria
- No architectural decisions: The design is already determined
- Isolated scope: Changes are contained to one module or feature
- Obvious validation: Tests or demo steps that prove it works
- Low risk: Failure doesn't compromise security, data integrity, or production stability
The AFK workflow
- Planner agent examines backlog and identifies non-blocked AFK issues
- Implementation agents pick issues in parallel, each in isolated environment
- Agents commit when tests pass and acceptance criteria are met
- Reviewer agents check output against quality standards
- Merger agent integrates approved changes
Pocock demonstrates a real example where 3 agents work in parallel on separate vertical slices, implementing and testing autonomously while he's in meetings.
Why this is not "fully autonomous coding"
Critically, AFK does not mean "let the agent do whatever it wants." It means:
- Human-defined scope: You decided this issue is safe for autonomous work
- Human-written tests: The agent must satisfy specifications you created
- Human review gates: Output is reviewed before merge
- Human architecture: The design decisions are already made
The autonomous part is the mechanical implementation of well-specified work. The judgment, design, and quality standards remain human responsibilities.
Sand Castle: parallelizing autonomous agents safely
To operationalize AFK workflows at production scale, Pocock built Sand Castle, a TypeScript framework for orchestrating sandboxed coding agents.
The core problem
Running multiple agents in parallel on the same codebase creates:
- Git conflicts from agents editing the same files
- Test interference when agents run tests simultaneously
- Resource contention for databases, ports, and file locks
- Environment pollution from package installations and config changes
- Security risk if agents have unbounded system access
How Sand Castle solves this
import { run } from 'sandcastle';
await run({
issue: 'Add user profile endpoint',
branch: 'feature/user-profile',
worktree: true, // Isolate in git worktree
docker: true, // Sandboxed execution
timeout: 30 * 60 * 1000, // 30 minute max
});
For each agent, Sand Castle:
- Creates a git worktree: Isolated working directory on separate branch
- Spins up Docker container: Sandboxed environment with resource limits
- Passes issue context: Agent only sees relevant code and requirements
- Monitors execution: Logs, timeout enforcement, error capture
- Collects output: Committed changes, test results, agent logs
- Cleans up: Removes worktree and container when done
The four agent types
Sand Castle coordinates:
| Agent type | Responsibility |
|---|---|
| Planner | Examines full backlog, identifies non-blocked AFK issues, determines parallelization opportunities |
| Implementation | Works on single issue in isolated environment, writes code, runs tests, commits when green |
| Reviewer | Reviews implementation agent output against quality standards, fresh context |
| Merger | Integrates approved changes, resolves conflicts if needed, validates integration tests |
Production considerations
The workshop emphasizes Sand Castle is not a "set it and forget it" solution. Production use requires:
- Clear issue specifications: Garbage in, garbage out
- Comprehensive test suites: Agents validate against your tests
- Review gates: Human or Opus agent reviews before merge
- Monitoring: Track success rates, common failures, quality trends
- Cost awareness: Parallel agents consume API quota quickly
- Rollback procedures: Be prepared to revert failed deployments
Pocock shows real examples where Sand Castle enables 3-5x faster feature delivery for well-specified work, while acknowledging it's inappropriate for exploratory or high-judgment tasks.
Code review strategies: Sonnet for speed, Opus for quality
The workshop dedicates substantial time to code review as a quality gate for agent-generated code.
The anti-pattern
The default approach many teams adopt:
- Agent writes code
- Glance at the diff
- Merge if it looks fine
- Discover problems in production
This treats code review as a formality rather than a quality enforcement mechanism.
Pocock's recommended strategy
| Review aspect | Approach | Tool |
|---|---|---|
| Implementation | Use Sonnet for speed and cost | Claude Sonnet 4.5 |
| Review | Use Opus for quality and judgment | Claude Opus 4.5 |
| Context | Reviewer always works in fresh context, not implementation thread | New conversation |
| Standards | Explicit quality checklist provided to reviewer | Documented criteria |
| Focus | List issues first, not praise | Critical evaluation |
Why use different models for implementation vs review
- Sonnet: Fast, cost-effective, good at following specifications
- Opus: Superior reasoning, catches subtle bugs, better architectural judgment
By using Sonnet for the mechanical implementation work and Opus for critical review, you optimize for both velocity and quality.
The fresh context principle
This is critical: the reviewer should not have access to the implementation conversation history.
Why? Because seeing the implementation context creates anchoring bias. The reviewer starts evaluating whether the agent solved the problem it thought it was solving, rather than whether the solution is objectively correct.
Fresh context forces the reviewer to evaluate:
- Does this code do what the issue specification claims?
- Are there edge cases the implementation missed?
- Are there security, performance, or maintainability problems?
- Does this align with project architecture and standards?
What good review output looks like
Pocock shows examples where reviewer agents:
- List issues first, not praise
- Provide specific examples of problems found
- Suggest fixes with code snippets
- Assess merge-readiness: Can this ship? What blocks it?
- Highlight architectural concerns that might affect future work
Poor review: "This looks good, I don't see any issues."
Good review: "This implementation has three problems: 1) Missing null check on line 47 could throw in production. 2) No rate limiting on the endpoint. 3) Error messages leak internal implementation details. Recommend fixing before merge."
The ralph loop: bash-driven autonomous agents
A brief but powerful pattern from the workshop is what the community calls the "ralph loop"—popularized by Pocock's demonstrations of bash-driven agent automation.
The basic pattern
while true; do
claude-code --file prompt.md
if [ $? -eq 0 ]; then
git add .
git commit -m "Agent work completed"
else
echo "Agent failed, stopping loop"
break
fi
sleep 5
done
The prompt.md file tells the agent:
- Read your state from disk (issue descriptions, tests, last commit message)
- Pick the next AFK issue from backlog
- Implement it following project standards
- Run tests
- Commit if green
- Exit with success code
Why this works
The loop enables continuous autonomous work without human intervention. The agent:
- Reads its own progress from git history
- Picks work from a prioritized backlog
- Self-validates through tests
- Commits only when successful
- Stops when it encounters ambiguity or failure
This is the operational pattern behind Pocock's demonstrations of "I started 3 agents before lunch and they shipped 8 features by end of day."
When to use the ralph loop
Good use cases:
- Backlog of well-specified, test-defined issues
- Stable test suite that reliably catches regressions
- Low-risk changes (feature flags protect production)
- Cost-insensitive context (burning API quota is acceptable)
Bad use cases:
- Exploratory work requiring judgment
- Codebase without comprehensive tests
- Changes touching security, payments, or data integrity
- Situations where mistakes are expensive to fix
The ralph loop is a production pattern for mechanical work, not a replacement for thoughtful engineering.
Practical workflow: end-to-end example
The workshop culminates in a complete walkthrough demonstrating all concepts together:
Phase 1: Alignment (10-15 minutes, human + AI)
- Developer describes high-level feature request
- Run
/grill-meskill - Agent asks 40-60 clarifying questions
- Developer answers, agent captures edge cases, constraints, dependencies
- Reach shared understanding
Phase 2: Planning (15-20 minutes, human + AI)
- Run
/to-prdto generate structured PRD - File PRD as GitHub issue
- Review and refine PRD with team
- Run
/prd-to-issuesto decompose into vertical slices - Classify each issue as HITL or AFK
- Sequence issues based on dependencies
Phase 3: Implementation (autonomous, parallel agents)
For each AFK issue:
- Sand Castle creates worktree and Docker environment
- Implementation agent loads issue + relevant context (stays in smart zone)
- Agent writes failing test (if not already present)
- Agent implements code to pass tests
- Agent commits on success
- Reviewer agent evaluates in fresh context
- Merger agent integrates if approved
Developer focuses on HITL issues requiring architectural decisions.
Phase 4: Review (human + AI)
- Developer reviews merged PRs
- Manual testing on staging environment
- Final quality check before production deployment
- Feature flag rollout if appropriate
Phase 5: Retrospective (human)
- What went well? What didn't?
- Were issue specifications clear enough?
- Did agents produce production-quality code?
- What patterns should we encode into skills?
- What quality problems should we add to review checklist?
This retrospective feedback becomes the input for improving skills, specifications, and workflows for the next feature.
Tools and resources from the workshop
Core repositories
- mattpocock/skills: 29+ agent skills for professional engineering workflows
- mattpocock/sandcastle: TypeScript framework for parallelizing sandboxed agents
- mattpocock/dictionary-of-ai-coding: AI coding jargon explained in plain English
Key skills from Matt Pocock
| Skill | Purpose |
|---|---|
| /grill-me | Force alignment through deep questioning |
| /to-prd | Convert conversations into structured PRD |
| /prd-to-issues | Decompose PRD into vertical-slice issues |
| /tdd | Guide test-driven development workflow |
| /improve-codebase-architecture | Identify shallow modules and suggest deep alternatives |
Educational resources
- AI Hero: Pocock's platform with workshops, courses, and skill documentation
- Claude Code for Real Engineers: 2-week cohort teaching AI coding from first principles
- Things People Get Wrong with /grill-me: Common mistakes and how to avoid them
Foundational reading
- A Philosophy of Software Design by John Ousterhout: Deep vs shallow modules, complexity management
- The Pragmatic Programmer: Tracer bullets, vertical slicing, quality-driven development
Critical skills for AI-augmented engineering
The workshop ultimately argues these traditional engineering skills become more important, not less, when working with AI:
1. Requirements clarity
AI agents are terrible at operating with ambiguous requirements. The better you can articulate constraints, edge cases, and success criteria, the more useful the agent becomes.
Skill to develop: Practice writing precise acceptance criteria and failing tests that capture requirements unambiguously.
2. Architectural judgment
Agents implement solutions within the architecture you provide. They won't spontaneously refactor to a better design pattern or question a flawed system boundary.
Skill to develop: Study design patterns, system design, and module boundaries. Make architectural decisions deliberately before delegating implementation.
3. Test design
Tests define "correct" for autonomous agents. Poor tests mean agents optimize for passing bad tests rather than solving real problems.
Skill to develop: Master TDD, property-based testing, integration testing, and test design. Learn to write tests that actually validate desired behavior.
4. Code review
Agent-generated code requires scrutiny for security, performance, edge cases, and alignment with project standards. Rubber-stamping merges creates technical debt at machine speed.
Skill to develop: Systematic code review, security awareness, performance analysis, and architectural consistency checking.
5. Task decomposition
Keeping work in the smart zone requires breaking complex problems into smaller, well-scoped phases. This is the core skill that enables effective agent work.
Skill to develop: Practice vertical slicing, tracer bullet development, and incremental delivery. Learn to see the smallest complete slice that delivers value.
Common mistakes teams make
The workshop identifies several anti-patterns:
1. Treating AI as a senior developer
Mistake: "The AI will figure out the architecture."
Reality: Agents implement within constraints you provide. Architecture is your job.
2. Skipping the alignment phase
Mistake: Immediately start coding from vague requirements.
Reality: 15 minutes of grill-me saves hours of rework.
3. Working in the dumb zone
Mistake: Pasting entire codebase, all conversation history, every GitHub issue into context.
Reality: Bloated context degrades reasoning quality. Stay lean.
4. Skipping tests
Mistake: "The agent can write tests later."
Reality: Tests are the specification that makes autonomous work possible. Write them first.
5. Blind trust in agent output
Mistake: Merging agent PRs without review.
Reality: Agents produce plausible-looking code that may be wrong, insecure, or inefficient.
6. Not classifying work appropriately
Mistake: Treating all work as AFK-able.
Reality: Judgment-heavy decisions require human involvement. Only mechanical implementation should be delegated to autonomous agents.
How to apply this in your workflow
Week 1: Learn the patterns
- Watch the full workshop video
- Read through Matt Pocock's skills repository
- Try the grill-me skill on a real feature request
- Practice writing PRDs with to-prd
- Decompose one PRD into vertical slices
Week 2: Adopt TDD
- Start writing failing tests before asking agents for implementation
- Give agents one failing test at a time
- Review implementations for passing tests while maintaining code quality
- Build comfort with the Red-Green-Refactor cycle
Week 3: Experiment with AFK classification
- Look at your backlog and classify issues as HITL or AFK
- Try letting an agent work autonomously on one AFK issue
- Review the output critically
- Iterate on how you write specifications to improve autonomous results
Week 4: Introduce review rigor
- Stop rubber-stamping agent PRs
- Use Opus for review of Sonnet implementations
- Maintain explicit quality checklists
- Track common problems and add them to review criteria
Month 2-3: Scale with Sand Castle
- Identify opportunities for parallel agent work
- Set up Sand Castle for worktree isolation
- Run 2-3 agents in parallel on well-specified issues
- Monitor results and iterate on specifications
Who this workshop is for
The workshop explicitly targets professional software engineers who:
- Already understand software fundamentals (architecture, testing, design patterns)
- Are skeptical of "AI will replace all developers" hype
- Want practical patterns for production use
- Value code quality, maintainability, and engineering discipline
- Are willing to invest in learning how to work effectively with AI rather than fighting it
This is not a workshop for:
- Non-technical people hoping to become developers through AI
- Developers looking for shortcuts to avoid learning fundamentals
- Teams that don't already have testing, code review, and quality standards
- Anyone expecting "fully autonomous" development with zero human involvement
The target audience is experienced engineers who want to 10x their leverage while maintaining production quality standards.
The meta-lesson: AI amplifies your system
The deepest insight from the workshop is this: AI agents amplify whatever system you give them.
If your system has:
- Clear requirements → Agents produce aligned implementations
- Comprehensive tests → Agents validate their work reliably
- Deep modules → Agents integrate cleanly
- Quality standards → Agents meet your bar
- Review rigor → Problems get caught before production
But if your system has:
- Vague requirements → Agents guess wrong
- Missing tests → Agents ship bugs
- Shallow modules → Agents create brittle coupling
- No standards → Agents produce inconsistent code
- No review → Technical debt compounds at machine speed
AI does not fix broken engineering processes. It accelerates whatever process you have—for better or worse.
Pocock's workshop is fundamentally about building systems worthy of amplification.
What's next: Claude Code for Real Engineers
For engineers who want to go deeper, Matt Pocock offers Claude Code for Real Engineers, a 2-week cohort covering:
- AI coding from first principles
- Complete workflow implementation
- Production patterns and anti-patterns
- Real codebase examples
- Live skill development
- Community of serious practitioners
The cohort is designed for working engineers who want production-ready skills, not toy demos.
Related reading
The concepts in this workshop connect to broader patterns in the AI agent ecosystem:
- Hiten Shah's AI Skill Library Strategy: Turning top performers' workflows into reusable agent skills
- Matt Pocock's Agent Skills for Real Engineers: Deep dive into the skills repository
- Agent Skills Security: Threat modeling for agent extensions
- What are Agent Skills?: Complete guide to the agent skills ecosystem
Bottom line
Matt Pocock's "AI Coding for Real Engineers" workshop makes a definitive argument: software engineering fundamentals are not obsolete in the age of AI—they're the leverage layer that determines whether AI agents amplify your capabilities or multiply your problems.
The workshop provides practical, production-ready patterns for:
- Keeping work in the smart zone through task decomposition
- Forcing alignment with grill-me before coding begins
- Structuring requirements as PRDs and vertical-slice issues
- Writing tests first to enable autonomous implementation
- Classifying work as HITL vs AFK to parallelize safely
- Using Sand Castle to orchestrate parallel sandboxed agents
- Reviewing code rigorously with fresh context and explicit standards
- Building deep modules that agents can work with effectively
These are not theoretical concepts. They're workflows that professional engineering teams are using in production to ship features 3-5x faster while maintaining quality standards.
The key insight: your job as an engineer is not to write every line of code yourself. It's to design systems, define specifications, encode quality standards, and maintain architectural integrity—then delegate mechanical implementation to agents that work within those constraints.
Master those skills, and AI becomes genuine leverage. Skip them, and AI becomes a chaos multiplier.
Workshop note: This comprehensive breakdown is based on Matt Pocock's "AI Coding for Real Engineers" full workshop video released April 24, 2026, plus analysis of his open-source skills repository, Sand Castle framework, community discussions, and related educational content from AI Hero. For latest updates, skill releases, and cohort information, visit aihero.dev and the mattpocock/skills GitHub repository.
Sources
- Full Walkthrough: Workflow for AI Coding — Matt Pocock (YouTube)
- Matt Pocock's Skills Repository (GitHub)
- Sand Castle Framework (GitHub)
- AI Hero - Matt Pocock's Educational Platform
- Matt Pocock on LLM Planning: "Don't Bite Off More Than You Can Chew"
- Matt Pocock: Why AI Coding's 'Smart Zone' Is Only 100K Tokens
- Things People Get Wrong with /grill-me and /grill-with-docs
- Matt Pocock Skills Deep Dive: 17 AI Agent Skills Fully Dissected
- Matt Pocock's 5 Claude Code Skills Made Me Rewrite How I Work With AI Agents
- Full Walkthrough: Workflow for AI Coding - Matt Pocock (Sean Weldon)
- Reject Vibe Coding: Matt Pocock's Skills Repo Adds Engineering Constraints
- Matt Pocock on X: Sand Castle Framework Announcement