On May 30, 2026, Farza Majeed—founder of buildspace and HeyClicky—posted a 104-second demo video showing complete hands-free voice control of his Mac using OpenAI's GPT-Realtime 2.0 model. The video went viral, reaching nearly 3 million views and drawing praise from OpenAI's Greg Brockman, who called it "real magic."

Farza's HeyClicky demo shows opening VS Code, editing a snake game, and playing Spotify—all with voice commands. No hands required.
The demo showcased HeyClicky, a Mac app that sits in your menu bar and enables always-on voice control—opening applications, editing files, controlling media playback, and navigating your system entirely through natural language voice commands.
This article breaks down what makes the HeyClicky demo so compelling, how GPT-Realtime 2.0 enables hands-free computer control, the viral reaction from the tech community, use cases for productivity and accessibility, and what this signals about the future of human-computer interaction.
TL;DR
| Topic | Takeaway |
|---|---|
| HeyClicky | Mac app for hands-free voice control powered by GPT-Realtime 2.0; sits in menu bar with always-on listening |
| Demo Highlights | 104-second video shows opening VS Code, editing snake game code (changing snake color), playing Spotify—all with voice |
| Viral Impact | Nearly 3M views; praised by Greg Brockman (OpenAI), Ankit Gupta (Y Combinator), compared to Jobs-level demo |
| Technology | Powered by GPT-Realtime 2.0 (GPT-5-class reasoning, 128K context, 95% adversarial call success) |
| Availability | Mac-only currently; free trial at heyclicky.com/try; Windows version coming soon |
| Use Cases | Productivity (hands-free coding, multitasking), accessibility (motor impairment support), content creation (on-the-fly editing) |
| Key Innovation | Continuous listening + real-time action execution without manual activation; fluid, natural conversation flow |
The Demo That Went Viral
What Farza Showed
In the 104-second video, Farza demonstrates HeyClicky handling a variety of real-world tasks:
1. Opening Applications
- "Open VS Code" → Visual Studio Code launches instantly
- "Play some music on Spotify" → Spotify opens and starts playing
2. Code Editing
- Farza asks HeyClicky to open a snake game he's been working on
- He instructs it to change the snake's color to blue
- HeyClicky edits the code in real-time while Farza watches
3. Natural Conversation Flow
- HeyClicky handles interruptions and follow-up commands without needing wake words
- The interaction feels conversational rather than robotic command-response patterns
4. Always-On Mode
- No need to press hotkeys or say "Hey Siri"
- HeyClicky listens continuously and distinguishes between commands and ambient conversation
The Viral Response
Nearly 3 million views and counting, with high-profile reactions:
Greg Brockman (OpenAI President & Co-Founder):
"GPT Realtime 2 unlocks some real magic"
Ankit Gupta (Y Combinator):
"This is how you give a demo"
Audrey (@audrlo):
"dare i say this is a Jobs-level demo. if Jobs was delivering this product, he would have performed this on stage."
The demo's success wasn't just about the technology—it was about showing, not telling. Farza didn't explain features or walk through a slide deck. He just used the product for 104 seconds, and the value proposition became instantly clear.
Why GPT-Realtime 2.0 Makes This Possible
GPT-5-Class Reasoning in Voice
HeyClicky is powered by OpenAI's GPT-Realtime 2.0, released May 7, 2026. This model brings GPT-5-class reasoning to real-time voice interactions, enabling:
1. Understanding Complex Commands Not just "open this app" but "change the snake color to blue in my snake game"—requiring:
- Finding the relevant file
- Identifying the color variable
- Making the edit
- Confirming the change
2. Handling Interruptions Users can interrupt mid-response or change topics without breaking the conversation flow—critical for natural computer control.
3. Context Retention 128K token context window (4× larger than GPT-Realtime-1.5) means HeyClicky can remember earlier commands, open files, and conversation history throughout your work session.
4. Adversarial Robustness 95% success rate on adversarial calls means it handles:
- Unclear commands ("that blue thing we talked about earlier")
- Background noise and interruptions
- Frustrated or rapid-fire instructions
Technical Capabilities Enabling Voice Control
| Capability | Why It Matters for HeyClicky |
|---|---|
| Speech-to-Speech | No transcription lag—voice in, voice out in real-time |
| Tool Calling | Can trigger macOS APIs, Automator scripts, AppleScript for system control |
| Low Latency | Commands execute immediately, not after processing delay |
| Configurable Reasoning | Can use "high" reasoning for complex edits, "low" for simple app launches |
| Interruption Handling | Supports natural "wait, no, do this instead" workflows |
For full technical details on GPT-Realtime 2.0, see: OpenAI GPT-Realtime-2: Voice Models Guide
How HeyClicky Works
Architecture
While Farza hasn't open-sourced HeyClicky (as of May 31, 2026), based on the demo and similar projects, here's the likely architecture:
1. Always-On Listening
- Menu bar app with continuous microphone access
- Local voice activity detection (VAD) to distinguish speech from silence
- Privacy controls to pause/resume listening
2. GPT-Realtime 2.0 Integration
- Streams audio directly to OpenAI's Realtime API
- Receives real-time responses and action instructions
- No transcription step—speech-to-speech end-to-end
3. macOS Integration Layer
- AppleScript execution for app launching, window management
- Accessibility API for UI element control
- File system access for code editing
- Media controls for Spotify, iTunes, etc.
4. Context Management
- Tracks open applications and file paths
- Maintains conversation history for follow-up commands
- Caches system state to avoid repeated queries
Privacy and Security
Key Considerations:
Audio Processing: Audio is streamed to OpenAI's cloud—not suitable for environments requiring local-only processing (sensitive work, classified environments).
System Access: HeyClicky needs Accessibility permissions to control your Mac—similar to automation tools like Keyboard Maestro or Alfred.
Always-On Listening: Continuous microphone access raises privacy concerns—users should understand what's being recorded and when.
Mitigation:
- Clear visual indicators when listening is active
- Manual pause/resume controls
- Transparency about what audio is sent to OpenAI
- Local VAD to minimize unnecessary cloud transmission
Use Cases: Productivity, Accessibility, and Beyond
1. Hands-Free Coding
Scenario: You're debugging and need to switch between files, run tests, check documentation—all while keeping your hands on the keyboard for actual coding.
With HeyClicky:
- "Open the user authentication file"
- "Run the test suite"
- "Show me the API documentation for this function"
- "Change this variable name to
isAuthenticated"
Benefit: Multitasking without context switching—your hands stay on the keyboard for typing, voice handles navigation and tooling.
2. Accessibility for Motor Impairments
Scenario: Users with limited hand mobility, RSI (repetitive strain injury), or motor impairments struggle with traditional keyboard/mouse navigation.
With HeyClicky:
- Complete computer control via voice
- No need for complex accessibility hardware
- Natural language commands instead of memorizing keyboard shortcuts
Benefit: Democratizes computer access—professional-grade voice control at consumer price points (compared to specialized accessibility tools costing thousands).
3. Content Creation and Editing
Scenario: Video editors, writers, and designers need to make quick edits while reviewing content.
With HeyClicky:
- "Change this heading to blue"
- "Insert a new clip after the intro"
- "Duplicate this layer and move it left"
Benefit: Faster iteration cycles—speak edits as you think them, no need to pause, mouse over, click through menus.
4. Multitasking and Workflow Automation
Scenario: You're on a video call and need to share a file, launch a demo, or check calendar without fumbling with windows.
With HeyClicky:
- "Share my screen"
- "Open the Q2 sales deck"
- "What's my next meeting?"
Benefit: Seamless multitasking—handle auxiliary tasks without interrupting your primary focus (the call, the presentation, the conversation).
The "Jobs-Level Demo" Phenomenon
Why the Demo Resonated
Audrey's comparison to a "Jobs-level demo" wasn't accidental. The HeyClicky demo follows classic Steve Jobs presentation principles:
1. Show, Don't Tell No feature list, no bullet points—just 104 seconds of using the product.
2. Real-World Tasks Not "here's how to launch an app" but "here's me editing my snake game"—authentic, relatable use cases.
3. No Rehearsed Script Farza spoke naturally, as if talking to a colleague—conversational, unpolished, genuine.
4. Clear Value Proposition Within 10 seconds, viewers understood: "I can control my computer with my voice."
5. Aspirational Yet Accessible The demo made viewers think: "I could use this today" not "this is 5 years away."
Viral Marketing Lessons
What HeyClicky Got Right:
1. Platform Choice Posted on X (Twitter)—native video, easy to share, tech-savvy audience.
2. Timing Posted just 3 weeks after GPT-Realtime 2.0 launch—still fresh, no competing demos, high community curiosity.
3. Length 104 seconds—short enough to watch without commitment, long enough to show substance.
4. No Sales Pitch No "sign up now" or "limited time offer"—just a demo and a link (heyclicky.com/try). The product sold itself.
5. Founder-Led Farza's credibility as buildspace founder lent authenticity—this wasn't a corporate ad, it was a builder showing what he built.
Comparison: HeyClicky vs Other Voice Control Solutions
HeyClicky vs Siri
| Feature | HeyClicky | Siri |
|---|---|---|
| Reasoning | GPT-5-class (GPT-Realtime 2.0) | Basic NLP (Apple Neural Engine) |
| Context Window | 128K tokens | Limited (single-turn or short multi-turn) |
| Always-On | Yes (continuous listening) | Wake word required ("Hey Siri") |
| Code Editing | Yes (can edit files directly) | No |
| Complex Commands | Yes (multi-step reasoning) | Limited to pre-defined actions |
| Privacy | Cloud (OpenAI) | On-device + cloud (Apple) |
Winner: HeyClicky for productivity and complex tasks; Siri for privacy-conscious users and basic commands.
HeyClicky vs Voice Control (macOS Accessibility)
| Feature | HeyClicky | macOS Voice Control |
|---|---|---|
| Natural Language | Yes (conversational) | No (command-based: "click save") |
| Reasoning | GPT-5-class | Rule-based |
| Learning Curve | Low (speak naturally) | High (memorize commands) |
| File Editing | Yes (AI-driven) | Limited (dictation only) |
| Accessibility | Good | Excellent (designed for accessibility) |
Winner: HeyClicky for general productivity; macOS Voice Control for users needing on-device, privacy-first accessibility.
HeyClicky vs Talon Voice
Talon Voice is a popular voice control tool for coders with motor impairments.
| Feature | HeyClicky | Talon Voice |
|---|---|---|
| Target Audience | General productivity | Coders, accessibility users |
| Commands | Natural language | Custom grammars, phonetic alphabet |
| Customization | Limited (AI-driven) | Extreme (Python scripting) |
| Learning Curve | Low | Very high |
| Cost | TBD (free trial available) | Free (open-source, paid add-ons) |
Winner: HeyClicky for casual users and quick adoption; Talon for power users needing deep customization.
Challenges and Limitations
1. Cloud Dependency
Challenge: HeyClicky relies on OpenAI's cloud—requires internet connection and sends audio to external servers.
Implications:
- Not suitable for offline work or air-gapped environments
- Privacy concerns for sensitive work (legal, medical, classified)
- Latency during poor connectivity
Mitigation:
- Future on-device models (e.g., Apple Silicon-optimized voice models)
- Hybrid mode: local for simple commands, cloud for complex reasoning
2. Cost Unpredictability
Challenge: GPT-Realtime 2.0 costs $32 per 1M audio input tokens, $64 per 1M output tokens—pricing can vary widely based on usage.
Implications:
- Heavy users (8+ hours/day) could incur significant monthly costs
- Idle listening (if transmitted to cloud) adds unnecessary expense
Mitigation:
- Local VAD to minimize cloud transmission
- Tiered pricing (free tier for light use, paid for power users)
- Prompt caching (98.75% discount on repeated prompts)
3. Accuracy and Error Handling
Challenge: Voice recognition errors, ambiguous commands, or misinterpreted intent can lead to incorrect actions.
Example:
- User: "Delete the test file"
- HeyClicky: Deletes the production file named "test_production.py"
Mitigation:
- Confirmation prompts for destructive actions
- Undo functionality for recent commands
- Verbose mode: HeyClicky narrates what it's about to do before doing it
4. Background Noise and Ambient Speech
Challenge: Always-on listening can trigger on ambient conversations, TV/podcast audio, or background noise.
Mitigation:
- Wake word option for users who prefer explicit activation
- Smart filtering: Distinguish between user commands and background chatter
- Manual pause button for meetings, calls, or sensitive conversations
The Road Ahead: What's Next for Voice-Controlled Computing
Short-Term (2026-2027)
1. Windows and Linux Support Farza confirmed a Windows version is coming soon; expect Linux community builds shortly after.
2. Multimodal Integration Combine voice + vision—"change the color of this button" while looking at the screen (leveraging GPT-Realtime 2.0's future multimodal capabilities).
3. Fine-Tuning for Domain-Specific Tasks Custom HeyClicky versions for:
- Code-specific commands (refactoring, debugging, running tests)
- Design tools (Figma, Sketch, Photoshop voice control)
- Video editing (Premiere, Final Cut, DaVinci Resolve)
4. Community Plugins and Extensions Open API for third-party integrations—Slack, Notion, Linear, GitHub voice control.
Medium-Term (2027-2028)
1. On-Device Voice Models Apple Silicon, Snapdragon, and NVIDIA chips powerful enough to run local GPT-class voice models—no cloud dependency, instant responses, complete privacy.
2. Ambient Computing Voice control extends beyond computers—smart homes, cars, AR glasses, all controlled via unified voice interface.
3. Collaborative Voice Agents Multiple users in a room can issue commands to shared HeyClicky instance—useful for team meetings, classrooms, collaborative design sessions.
Long-Term (2028+)
1. Full Voice-Native Workflows Entire professions (e.g., writing, design, data analysis) operate primarily via voice, with keyboard/mouse as fallback—not the other way around.
2. Neurodiversity and Accessibility Revolution Voice control becomes standard, not a niche accessibility feature—benefiting users with ADHD, dyslexia, motor impairments, and anyone preferring verbal interaction.
3. Post-Keyboard Era Voice becomes the primary input method for most computing tasks, similar to how touch replaced keyboards on mobile—paradigm shift in HCI (human-computer interaction).
How to Try HeyClicky
Get Started
1. Free Trial Visit heyclicky.com/try to sign up for the free trial (Mac only as of May 31, 2026).
2. System Requirements
- macOS (version TBD—likely macOS 12+)
- Microphone (built-in or external)
- Internet connection (for GPT-Realtime 2.0 API calls)
- Accessibility permissions (to control system UI)
3. Setup
- Install the HeyClicky menu bar app
- Grant microphone and accessibility permissions
- Start the always-on listening mode or configure a wake word
4. First Commands Try simple tasks first:
- "Open Safari"
- "Play music on Spotify"
- "What's the weather today?"
Then progress to complex commands:
- "Find the file where I defined the user authentication function"
- "Change all instances of 'userId' to 'accountId' in this file"
Best Practices
1. Speak Naturally Don't overthink commands—GPT-Realtime 2.0's reasoning handles conversational speech well.
2. Be Specific Instead of "open that file," say "open the Python file I was editing yesterday about user authentication."
3. Confirm Destructive Actions For file deletions, large edits, or irreversible operations, use confirmation prompts.
4. Pause During Meetings Use manual pause to avoid accidental triggers during calls or sensitive discussions.
Bottom Line: The Future of Computing Is Voice-First
Farza's HeyClicky demo is more than a viral video—it's a proof of concept for the future of human-computer interaction. Powered by GPT-Realtime 2.0, it shows that hands-free, voice-first computing is no longer aspirational—it's available today.
Key Takeaways:
- GPT-Realtime 2.0 unlocks real-world voice control—not just smart home commands but complex tasks like code editing, app navigation, and workflow automation
- The demo went viral (3M views) because it showed, not told—104 seconds of authentic product use resonated more than any marketing campaign
- Accessibility meets productivity—voice control isn't just for users with disabilities; it's a faster, more natural interface for everyone
- This is just the beginning—expect multimodal voice (voice + vision), on-device models, and voice-native workflows to become mainstream by 2027-2028
Who Should Care:
- Developers and coders: Hands-free navigation and editing while keeping hands on keyboard
- Accessibility advocates: Affordable, powerful voice control for motor impairments
- Content creators: Faster editing and multitasking workflows
- Productivity enthusiasts: Voice as a second input method alongside keyboard/mouse
- Futurists: This is what post-keyboard computing looks like
OpenAI's Greg Brockman called it "real magic"—and he's right. HeyClicky demonstrates that the future of operating systems isn't just touch, gesture, or AR—it's voice-first, powered by GPT-5-class reasoning.
The age of talking to computers like colleagues has arrived.
Related Reading
For more on voice AI, GPT-Realtime 2.0, and the future of computing:
- OpenAI GPT-Realtime-2: Voice Models Guide
- What Are Agent Skills: Complete Guide
- Claude Opus 4.7 Models Guide
- AI Benchmarks in 2026: The Complete Guide
- Agentic Era: AI Future 2026-2030
Disclosure: This post is editorial analysis based on Farza Majeed's May 30, 2026 demo video on X, community reactions, and third-party coverage of HeyClicky and GPT-Realtime 2.0. Pricing and availability details are accurate as of May 31, 2026 but may change. For the latest information, visit heyclicky.com and OpenAI's platform.
Sources
- OpenAI — Advancing voice intelligence with new models in the API
- 9to5Mac — OpenAI has new voice models that reason, translate, and transcribe as you speak
- BuildFastWithAI — GPT-Realtime-2: OpenAI Voice AI Models 2026
- XDA Developers — Someone built a tiny AI that lives next to your cursor
- Y Combinator — HeyClicky: An AI buddy that lives on your Mac
- Latent Space — AINews GPT-Realtime-2, -Translate, and -Whisper
- Geeky Gadgets — GPT Realtime 2: OpenAI's Advanced Voice Model Launches