speech

Solid

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Speech Generation Skill Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to `gpt-4o-mini-tts-2025-12-15` and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs. ## When to use - Generate a single spoken clip from text - Generate a batch of prompts (many lines, many files) ## Decision tree (single vs batch) - If the user provides multiple lines/prompts or wants many outputs -> **batch** - Else -> **single** ## Workflow 1. Decide intent: single vs batch (see decision tree above). 2. Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints. 3. If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL. 4. Augment instructions into a short labeled spec without rewriting the input text. 5. Run the bundled CLI (`scripts/text_to_speech.py`) with sensible defaults (see references/cli.md). 6. For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints. 7. Iterate with a single targeted change (voice, speed, or instructions), then re-check. 8. Save/return final outputs and note the final text + instructions + flags used. ## Temp and output conventions - Use `tmp/speech/` for intermediate files (for example JSONL batches); delete when done. - Write final artifacts under `output/speech/` when working in this repo. - Use `--out` or `--out-dir` to con...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

speech

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

2,210 Updated 1 weeks ago
foryourhealth111-pixel
AI & Automation Listed

speech

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

1 Updated today
HGGodhand33
Data & Documents Solid

google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

303 Updated 3 weeks ago
sanjay3290
AI & Automation Listed

voice

Voice — text-to-speech and transcription. Triggers on /agent:voice, /agent:voice status, /agent:voice setup, /agent:voice test, "configurar voz", "prueba voz", "voice setup", "speak this", "read this aloud", "transcribe audio".

56 Updated today
crisandrews
AI & Automation Solid

oma-voice

Local-first text-to-speech and speech-to-text via the Voicebox MCP server. Generates speech from cloned or preset voice profiles for agent notifications, content voiceovers, and audio asset creation, and transcribes audio files for meeting notes or memos. Runs entirely on-device with no cloud, no API keys, no per-call cost. Use for voice generation, TTS, STT, transcription, voiceover, narration, dictation, audio asset work.

1,042 Updated today
first-fluke