axiom-vision▌
charleswiltgen/axiom · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Apple Vision Framework for computer vision tasks: subject segmentation, pose detection, text recognition, barcode scanning, and document processing.
- ›Covers 13+ Vision APIs across subject lifting, hand/body pose, person segmentation, text OCR, barcode detection, and document scanning with decision trees for choosing the right tool
- ›Includes 15 production patterns: combining APIs to exclude hands from objects, real-time gesture recognition, multi-person segmentation, fitness action classif
Vision Framework Computer Vision
Guides you through implementing computer vision: subject segmentation, hand/body pose detection, person detection, text recognition, barcode detection, document scanning, and combining Vision APIs to solve complex problems.
When to Use This Skill
Use when you need to:
- ☑ Isolate subjects from backgrounds (subject lifting)
- ☑ Detect and track hand poses for gestures
- ☑ Detect and track body poses for fitness/action classification
- ☑ Segment multiple people separately
- ☑ Exclude hands from object bounding boxes (combining APIs)
- ☑ Choose between VisionKit and Vision framework
- ☑ Combine Vision with CoreImage for compositing
- ☑ Decide which Vision API solves your problem
- ☑ Recognize text in images (OCR)
- ☑ Detect barcodes and QR codes
- ☑ Scan documents with perspective correction
- ☑ Extract structured data from documents (iOS 26+)
- ☑ Build live scanning experiences (DataScannerViewController)
Example Prompts
"How do I isolate a subject from the background?" "I need to detect hand gestures like pinch" "How can I get a bounding box around an object without including the hand holding it?" "Should I use VisionKit or Vision framework for subject lifting?" "How do I segment multiple people separately?" "I need to detect body poses for a fitness app" "How do I preserve HDR when compositing subjects on new backgrounds?" "How do I recognize text in an image?" "I need to scan QR codes from camera" "How do I extract data from a receipt?" "Should I use DataScannerViewController or Vision directly?" "How do I scan documents and correct perspective?" "I need to extract table data from a document"
Red Flags
Signs you're making this harder than it needs to be:
- ❌ Manually implementing subject segmentation with CoreML models
- ❌ Using ARKit just for body pose (Vision works offline)
- ❌ Writing gesture recognition from scratch (use hand pose + simple distance checks)
- ❌ Processing on main thread (blocks UI - Vision is resource intensive)
- ❌ Training custom models when Vision APIs already exist
- ❌ Not checking confidence scores (low confidence = unreliable landmarks)
- ❌ Forgetting to convert coordinates (lower-left origin vs UIKit top-left)
- ❌ Building custom text recognizer when VNRecognizeTextRequest exists
- ❌ Using AVFoundation + Vision when DataScannerViewController suffices
- ❌ Processing every camera frame for scanning (skip frames, use region of interest)
- ❌ Enabling all barcode symbologies when you only need one (performance hit)
- ❌ Ignoring RecognizeDocumentsRequest when you need table/list structure (iOS 26+)
Mandatory First Steps
Before implementing any Vision feature:
1. Choose the Right API (Decision Tree)
What do you need to do?
┌─ Isolate subject(s) from background?
│ ├─ Need system UI + out-of-process → VisionKit
│ │ └─ ImageAnalysisInteraction (iOS/iPadOS)
│ │ └─ ImageAnalysisOverlayView (macOS)
│ ├─ Need custom pipeline / HDR / large images → Vision
│ │ └─ VNGenerateForegroundInstanceMaskRequest
│ └─ Need to EXCLUDE hands from object → Combine APIs
│ └─ Subject mask + Hand pose + custom masking (see Pattern 1)
│
├─ Segment people?
│ ├─ All people in one mask → VNGeneratePersonSegmentationRequest
│ └─ Separate mask per person (up to 4) → VNGeneratePersonInstanceMaskRequest
│
├─ Detect hand pose/gestures?
│ ├─ Just hand location → VNDetectHumanRectanglesRequest
│ └─ 21 hand landmarks → VNDetectHumanHandPoseRequest
│ └─ Gesture recognition → Hand pose + distance checks
│
├─ Detect body pose?
│ ├─ 2D normalized landmarks → VNDetectHumanBodyPoseRequest
│ ├─ 3D real-world coordinates → VNDetectHumanBodyPose3DRequest
│ └─ Action classification → Body pose + CreateML model
│
├─ Face detection?
│ ├─ Just bounding boxes → VNDetectFaceRectanglesRequest
│ └─ Detailed landmarks → VNDetectFaceLandmarksRequest
│
├─ Person detection (location only)?
│ └─ VNDetectHumanRectanglesRequest
│
├─ Recognize text in images?
│ ├─ Real-time from camera + need UI → DataScannerViewController (iOS 16+)
│ ├─ Processing captured image → VNRecognizeTextRequest
│ │ ├─ Need speed (real-time camera) → recognitionLevel = .fast
│ │ └─ Need accuracy (documents) → recognitionLevel = .accurate
│ └─ Need structured documents (iOS 26+) → RecognizeDocumentsRequest
│
├─ Detect barcodes/QR codes?
│ ├─ Real-time camera + need UI → DataScannerViewController (iOS 16+)
│ └─ Processing image → VNDetectBarcodesRequest
│
└─ Scan documents?
├─ Need built-in UI + perspective correction → VNDocumentCameraViewController
├─ Need structured data (tables, lists) → RecognizeDocumentsRequest (iOS 26+)
└─ Custom pipeline → VNDetectDocumentSegmentationRequest + perspective correction
2. Set Up Background Processing
NEVER run Vision on main thread:
let processingQueue = DispatchQueue(label: "com.yourapp.vision", qos: .userInitiated)
processingQueue.async {
do {
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
// Process observations...
DispatchQueue.main.async {
// Update UI
}
} catch {
// Handle error
}
}
3. Choose the Right Request Handler
Processing video frames? Use VNSequenceRequestHandler (maintains inter-frame state for temporal smoothing). For single images, use VNImageRequestHandler. Creating a new VNImageRequestHandler per frame discards temporal context and causes jittery results. See axiom-vision-ref for full comparison and code examples.
4. Verify Platform Availability
| API | Minimum Version |
|---|---|
| Subject segmentation (instance masks) | iOS 17+ |
| VisionKit subject lifting | iOS 16+ |
| Hand pose | iOS 14+ |
| Body pose (2D) | iOS 14+ |
| Body pose (3D) | iOS 17+ |
| Person instance segmentation | iOS 17+ |
| VNRecognizeTextRequest (basic) | iOS 13+ |
| VNRecognizeTextRequest (accurate, multi-lang) | iOS 14+ |
| VNDetectBarcodesRequest | iOS 11+ |
| VNDetectBarcodesRequest (revision 2: Codabar, MicroQR) | iOS 15+ |
| VNDetectBarcodesRequest (revision 3: ML-based) | iOS 16+ |
| DataScannerViewController | iOS 16+ |
| VNDocumentCameraViewController | iOS 13+ |
| VNDetectDocumentSegmentationRequest | iOS 15+ |
| RecognizeDocumentsRequest | iOS 26+ |
Common Patterns
Pattern 1: Isolate Object While Excluding Hand
User's original problem: Getting a bounding box around an object held in hand, without including the hand.
Root cause: VNGenerateForegroundInstanceMaskRequest is class-agnostic and treats hand+object as one subject.
Solution: Combine subject mask with hand pose to create exclusion mask.
// 1. Get subject instance mask
let subjectRequest = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: sourceImage)
try handler.perform([subjectRequest])
guard let subjectObservation = subjectRequest.results?.first as? VNInstanceMaskObservation else {
fatalError("No subject detected")
}
// 2. Get hand pose landmarks
let handRequest = VNDetectHumanHandPoseRequest()
handRequest.maximumHandCount = 2
try handler.perform([handRequest])
guard let handObservation = handRequest.results?.first as? VNHumanHandPoseObservation else {
// No hand detected - use full subject mask
let mask = try subjectObservation.createScaledMask(
for: subjectObservation.allInstances,
croppedToInstancesContent: false
)
return mask
}
// 3. Create hand exclusion region from landmarks
let handPoints = try handObservation.recognizedPoints(.all)
let handBounds = calculateConvexHull(from: handPoints) // Your implementation
// 4. Subtract hand region from subject mask using CoreImage
let subjectMask = try subjectObservation.createScaledMask(
for: subjectObservation.allInstances,
croppedToInstancesContent: false
)
let subjectCIMask = CIImage(cvPixelBuffer: subjectMask)
let handMask = createMaskFromRegion(handBounds, size: sourceImage.size)
let finalMask = subtractMasks(handMask: handMask, from: subjectCIMask)
// 5. Calculate bounding box from final mask
let objectBounds = calculateBoundingBox(from: finalMask)
Helper: Convex Hull
func calculateConvexHull(from points: [VNRecognizedPointKey: VNRecognizedPoint]) -> CGRect {
// Get high-confidence points
let validPoints = points.values.filter { $0.confidence > 0.5 }
guard !validPoints.isEmpty else { return .zero }
// Simple bounding rect (for more accuracy, use actual convex hull algorithm)
let xs = validPoints.map { $0.location.x }
let ys = validPoints.map { $0.location.y }
let minX = xs.min()!
let maxX = xs.max()!
let minY = ys.min()!
let maxY = ys.max()!
return CGRect(
x: minX,
y: minY,
width: maxX how to use axiom-visionHow to use axiom-vision on Cursor
AI-first code editor with Composer
1Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add axiom-vision
2Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
$npx skills add https://github.com/charleswiltgen/axiom --skill axiom-visionThe skills CLI fetches axiom-vision from GitHub repository charleswiltgen/axiom and configures it for Cursor.
3Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
◆ Which agents do you want to install to?││ ── Universal (.agents/skills) ── always included ────│ • Amp│ • Antigravity│ • Cline│ • Codex│ ●Cursor(selected)│ • Cursor│ • Windsurf4Verify installation
Confirm successful installation by checking the skill directory location:
.cursor/skills/axiom-visionReload or restart Cursor to activate axiom-vision. Access the skill through slash commands (e.g., /axiom-vision) or your agent's skill management interface.
⚠Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
Additional Resources
List & Monetize Your Skill
Submit your Claude Code skill and start earning
GET_STARTED →Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
✓Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
✓Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
✓Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
✓Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
general reviewsRatings
4.7★★★★★66 reviews- ★★★★★Yusuf Chawla· Dec 28, 2024
I recommend axiom-vision for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Ama Agarwal· Dec 4, 2024
Keeps context tight: axiom-vision is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Ira Garcia· Nov 23, 2024
I recommend axiom-vision for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Anika Sanchez· Nov 19, 2024
Keeps context tight: axiom-vision is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Yuki Liu· Oct 14, 2024
axiom-vision reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Yuki Wang· Oct 10, 2024
Registry listing for axiom-vision matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Rahul Santra· Sep 25, 2024
Keeps context tight: axiom-vision is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Layla Khanna· Sep 21, 2024
axiom-vision has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Yuki Zhang· Sep 21, 2024
Useful defaults in axiom-vision — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Yusuf Ghosh· Sep 17, 2024
axiom-vision reduced setup friction for our internal harness; good balance of opinion and flexibility.
showing 1-10 of 66
1 / 7