babysit-pr▌
openai/codex · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Babysit a PR persistently until one of these terminal outcomes occurs:
PR Babysitter
Objective
Babysit a PR persistently until one of these terminal outcomes occurs:
- The PR is merged or closed.
- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
- Optional handoff milestone: the PR is currently green + mergeable + review-clean. Treat this as a progress state, not a watcher stop, so late-arriving review comments are still surfaced promptly while the PR remains open.
Do not stop merely because a single snapshot returns idle while checks are still pending.
Inputs
Accept any of the following:
- No PR argument: infer the PR from the current branch (
--pr auto) - PR number
- PR URL
Core Workflow
- When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (
--watch) unless you are intentionally doing a one-shot diagnostic snapshot. - Run the watcher script to snapshot PR/review/CI state (or consume each streamed snapshot from
--watch). - Inspect the
actionslist in the JSON response. - If
diagnose_ci_failureis present, inspect failed run logs and classify the failure. - If the failure is likely caused by the current branch, patch code locally, commit, and push.
- If
process_review_commentis present, inspect surfaced review items and decide whether to address them. - If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub.
- If a review item from another author is non-actionable, already addressed, or not valid, post one reply on the comment/thread explaining that decision (for example answering the question or explaining why no change is needed). If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.
- If the failure is likely flaky/unrelated and
retry_failed_checksis present, rerun failed jobs with--retry-failed-now. - If both actionable review feedback and
retry_failed_checksare present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change. - On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via
gh pr view) alongside CI. - After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
- If you had been using
--watchbefore pausing to patch/commit/push, relaunch--watchyourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill). - Repeat polling until
stop_pr_closedappears or a user-help-required blocker is reached. A green + review-clean + mergeable PR is a progress milestone, not a reason to stop the watcher while the PR is still open. - Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached
--watchprocess running and then end the turn as if monitoring were complete.
Commands
One-shot snapshot
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once
Continuous watch (JSONL)
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
Trigger flaky retry cycle (only when watcher indicates)
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
Explicit PR target
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --once
CI Failure Classification
Use gh commands to inspect failed runs before deciding to rerun.
gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headShagh run view <run-id> --log-failed
Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read .codex/skills/babysit-pr/references/heuristics.md for a concise checklist.
Review Comment Handling
The watcher surfaces review items from:
- PR issue comments
- Inline review comments
- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from chatgpt-codex-connector[bot]) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
When you agree with a comment and it is actionable:
- Patch code locally.
- Commit with
codex: address PR review feedback (#<n>). - Push to the PR head branch.
- After the push succeeds, mark the associated GitHub review thread/comment as resolved.
- Resume watching on the new SHA immediately (do not stop after reporting the push).
- If monitoring was running in
--watchmode, restart--watchimmediately after the push in the same turn; do not wait for the user to ask again.
If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again. If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
Git Safety Rules
- Work only on the PR head branch.
- Avoid destructive git commands.
- Do not switch branches unless necessary to recover context.
- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
- After each successful fix, commit and
git push, then re-run the watcher. - If you interrupted a live
--watchsession to make the fix, restart--watchimmediately after the push in the same turn. - Do not run multiple concurrent
--watchprocesses for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it. - A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.
Commit message defaults:
codex: fix CI failure on PR #<n>codex: address PR review feedback (#<n>)
Monitoring Loop Pattern
Use this loop in a live Codex session:
- Run
--once. - Read
actions. - First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
- Check CI summary, new review items, and mergeability/conflict status.
- Diagnose CI failures and classify branch-related vs flaky/unrelated.
- For each surfaced review item from another author, either reply once with an explanation if it is non-actionable or patch/commit/push and then resolve it if it is actionable. If a later snapshot surfaces your own reply, treat it as informational and continue without responding again.
- Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
- Retry failed checks only when
retry_failed_checksis present and you are not about to replace the current SHA with a review/CI fix commit. - If you pushed a commit, resolved a review thread, replied to a review comment, or triggered a rerun, report the action briefly and continue polling (do not stop).
- After a review-fix push, proactively restart continuous monitoring (
--watch) in the same turn unless a strict stop condition has already been reached. - If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report that the PR is currently ready to merge but keep the watcher running so new review comments are surfaced quickly while the PR remains open.
- If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
- Otherwise sleep according to the polling cadence below and repeat.
When the user explicitly asks to monitor/watch/babysit a PR, prefer --watch so polling continues autonomously in one command. Use repeated --once snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
If a --watch process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
Polling Cadence
Keep review polling aggressive and continue monitoring even after CI turns green:
- While CI is not green (pending/running/queued or failing): poll every 1 minute.
- After CI turns green: keep polling at the base cadence while the PR remains open so newly posted review comments are surfaced promptly instead of waiting on a long green-state backoff.
- Reset the cadence immediately whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
- If CI stops being green again (new commit, rerun, or regression): stay on the base polling cadence.
- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.
Stop Conditions (Strict)
Stop only when one of the following is true:
- PR merged or closed (stop as soon as a poll/snapshot confirms this).
- User intervention is required and Codex cannot safely proceed alone.
Keep polling when:
actionscontains onlyidlebut checks are still pending.- CI is still running/queued.
- Review state is quiet but CI is not terminal.
- CI is green but mergeability is unknown/pending.
- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes.
- The PR is green but blocked on review approval (
REVIEW_REQUIRED/ similar); continue polling at the base cadence and surface any new review comments without asking for confirmation to keep watching.
Output Expectations
Provide concise progress updates while monitoring and a final summary that includes:
-
During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
-
Treat push confirmations, intermediate CI snapshots, ready-to-merge snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
-
A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
-
A review-fix commit + push is not a completion event; immediately resume live monitoring (
--watch) in the same turn and continue reporting progress updates. -
When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style:
🚀 CI is all green! 33/33 passed. Still on watch for review approval. -
Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
-
Final PR SHA
-
CI status summary
-
Mergeability / conflict status
-
Fixes pushed
-
Flaky retry cycles used
-
Remaining unresolved failures or review comments
References
- Heuristics and decision tree:
.codex/skills/babysit-pr/references/heuristics.md - GitHub CLI/API details used by the watcher:
.codex/skills/babysit-pr/references/github-api-notes.md
How to use babysit-pr on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add babysit-pr
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches babysit-pr from GitHub repository openai/codex and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate babysit-pr. Access the skill through slash commands (e.g., /babysit-pr) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.4★★★★★54 reviews- ★★★★★Chaitanya Patil· Dec 20, 2024
Registry listing for babysit-pr matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Ishan Huang· Dec 20, 2024
Solid pick for teams standardizing on skills: babysit-pr is focused, and the summary matches what you get after install.
- ★★★★★Isabella Shah· Dec 12, 2024
babysit-pr has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Piyush G· Nov 11, 2024
babysit-pr reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Fatima Lopez· Nov 11, 2024
babysit-pr has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Layla Rahman· Nov 3, 2024
Solid pick for teams standardizing on skills: babysit-pr is focused, and the summary matches what you get after install.
- ★★★★★Fatima Anderson· Oct 22, 2024
We added babysit-pr from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Shikha Mishra· Oct 2, 2024
I recommend babysit-pr for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Zaid Gill· Oct 2, 2024
Keeps context tight: babysit-pr is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Zaid Gupta· Sep 25, 2024
babysit-pr reduced setup friction for our internal harness; good balance of opinion and flexibility.
showing 1-10 of 54