dialogue-audio▌
inferen-sh/skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Realistic multi-speaker dialogue audio generation with Dia TTS via inference.sh CLI.
- ›Supports two-speaker conversations with automatic voice assignment using [S1] and [S2] speaker tags
- ›Emotion and pacing controlled through punctuation ( . , ! , ? , ... , — ) and parenthetical sound cues like (laughs) , (sighs) , and (whispers)
- ›Includes structured patterns for interviews, tutorials, debates, and conversational content with practical script-writing guidelines
- ›Post-production support
Dialogue Audio
Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.
Quick Start
Requires inference.sh CLI (
infsh). Install instructions
infsh login
# Two-speaker conversation
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
Speaker Tags
Dia TTS uses [S1] and [S2] to distinguish two speakers.
| Tag | Role | Voice |
|---|---|---|
[S1] |
Speaker 1 | Automatically assigned voice A |
[S2] |
Speaker 2 | Automatically assigned voice B |
Rules:
- Always start each speaker turn with the tag
- Tags must be uppercase:
[S1]not[s1] - Maximum 2 speakers per generation
- Each speaker maintains consistent voice within a session
Emotion & Expression Control
Dia TTS interprets punctuation and non-speech cues for emotional delivery.
Punctuation Effects
| Punctuation | Effect | Example |
|---|---|---|
. |
Neutral, declarative, medium pause | "This is important." |
! |
Emphasis, excitement, energy | "This is amazing!" |
? |
Rising intonation, questioning | "Are you sure about that?" |
... |
Hesitation, trailing off, long pause | "I thought it would work... but it didn't." |
, |
Short breath pause | "First, we analyze. Then, we act." |
— or -- |
Interruption or pivot | "I was going to say — never mind." |
Non-Speech Sounds
Dia TTS supports parenthetical sound descriptions:
(laughs) — laughter
(sighs) — exasperation or relief
(clears throat) — attention-getting pause
(whispers) — softer delivery
(gasps) — surprise
Examples with Emotion
# Excited conversation
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'
# Serious/thoughtful dialogue
infsh app run falai/dia-tts --input '{
"prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'
# Teaching/explaining
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'
Pacing Control
Pause Hierarchy
| Technique | Pause Length | Use For |
|---|---|---|
Comma , |
~0.3 seconds | Between clauses, list items |
Period . |
~0.5 seconds | Between sentences |
Ellipsis ... |
~1.0 seconds | Dramatic pause, thinking, hesitation |
| New speaker tag | ~0.3 seconds | Natural turn-taking gap |
Speed Control
- Shorter sentences = faster perceived pace
- Longer sentences with commas = measured, thoughtful pace
- Questions followed by answers = engaging back-and-forth rhythm
# Fast-paced, energetic
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'
# Slow, contemplative
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'
Conversation Structure Patterns
Interview Format
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'
Tutorial / Explainer
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'
Debate / Discussion
infsh app run falai/dia-tts --input '{
"prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'
Post-Production Tips
Volume Normalization
Both speakers should be at consistent volume. If one is louder:
# Merge with balanced audio
infsh app run infsh/video-audio-merger --input '{
"video": "talking-head.mp4",
"audio": "dialogue.mp3",
"audio_volume": 1.0
}'
Adding Background/Music
# Merge dialogue with background music
infsh app run infsh/media-merger --input '{
"media": ["dialogue.mp3", "background-music.mp3"]
}'
Segmenting Long Conversations
For conversations longer than ~30 seconds, generate in segments:
# Segment 1: Introduction
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Welcome back to another episode..."
}'
# Segment 2: Main content
infsh app run falai/dia-tts --input '{
"prompt": "[S1] So let us dive into today s topic..."
}'
# Segment 3: Wrap-up
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Great conversation today..."
}'
# Merge all segments
infsh app run infsh/media-merger --input '{
"media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'
Script Writing Tips
| Do | Don't |
|---|---|
| Write how people talk | Write how people write |
| Short sentences (< 15 words) | Long academic sentences |
| Contractions ("can't", "won't") | Formal ("cannot", "will not") |
| Natural fillers ("So,", "Well,") | Every sentence perfectly formed |
| Vary sentence length | All sentences same length |
| Include reactions ("Exactly!", "Hmm.") | One-sided monologues |
| Read it aloud before generating | Assume it sounds right |
Common Mistakes
| Mistake | Problem | Fix |
|---|---|---|
| Monologues longer than 3 sentences | Sounds like a lecture, not conversation | Break into exchanges |
| No emotional variation | Flat, robotic delivery | Use punctuation and non-speech cues |
| Missing speaker tags | Voices don't alternate | Start every turn with [S1] or [S2] |
| Formal written language | Sounds unnatural spoken | Use contractions, short sentences |
| No pauses between topics | Feels rushed | Use ... or scene breaks |
| All same energy level | Monotonous | Vary between high/low energy moments |
Related Skills
# ElevenLabs dialogue (22+ voices, voice direction)
npx skills add inference-sh/skills@elevenlabs-dialogue
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video
Browse all apps: infsh app list
How to use dialogue-audio on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add dialogue-audio
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches dialogue-audio from GitHub repository inferen-sh/skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate dialogue-audio. Access the skill through slash commands (e.g., /dialogue-audio) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.4★★★★★54 reviews- ★★★★★Carlos Iyer· Dec 20, 2024
We added dialogue-audio from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Pratham Ware· Dec 16, 2024
Registry listing for dialogue-audio matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Henry Liu· Dec 16, 2024
Useful defaults in dialogue-audio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Hana Shah· Dec 8, 2024
dialogue-audio reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Hana Srinivasan· Dec 8, 2024
dialogue-audio fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Min Huang· Dec 4, 2024
dialogue-audio is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Soo Desai· Nov 27, 2024
Registry listing for dialogue-audio matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Anaya Huang· Nov 15, 2024
dialogue-audio is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Min Li· Nov 11, 2024
Useful defaults in dialogue-audio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Sakshi Patil· Nov 7, 2024
dialogue-audio reduced setup friction for our internal harness; good balance of opinion and flexibility.
showing 1-10 of 54