speech

Solid

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

AI & Automation 27,705 stars 2858 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Speech Generation Skill Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to `gpt-4o-mini-tts-2025-12-15` and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs. ## When to use - Generate a single spoken clip from text - Generate a batch of prompts (many lines, many files) ## Decision tree (single vs batch) - If the user provides multiple lines/prompts or wants many outputs -> **batch** - Else -> **single** ## Workflow 1. Decide intent: single vs batch (see decision tree above). 2. Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints. 3. If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL. 4. Augment instructions into a short labeled spec without rewriting the input text. 5. Run the bundled CLI (`scripts/text_to_speech.py`) with sensible defaults (see references/cli.md). 6. For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints. 7. Iterate with a single targeted change (voice, speed, or instructions), then re-check. 8. Save/return final outputs and note the final text + instructions + flags used. ## Temp and output conventions - Use `tmp/speech/` for intermediate files (for example JSONL batches); delete when done. - Write final artifacts under `output/speech/` when working in this repo. - Use `--out` or `--out-dir` to con...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

speech

2,210 Updated 1 weeks ago

foryourhealth111-pixel

AI & Automation Listed

speech

1 Updated today

HGGodhand33

Data & Documents Solid

google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

303 Updated 3 weeks ago

sanjay3290

AI & Automation Listed

voice

Voice — text-to-speech and transcription. Triggers on /agent:voice, /agent:voice status, /agent:voice setup, /agent:voice test, "configurar voz", "prueba voz", "voice setup", "speak this", "read this aloud", "transcribe audio".

56 Updated today

crisandrews

AI & Automation Solid

oma-voice

Local-first text-to-speech and speech-to-text via the Voicebox MCP server. Generates speech from cloned or preset voice profiles for agent notifications, content voiceovers, and audio asset creation, and transcribes audio files for meeting notes or memos. Runs entirely on-device with no cloud, no API keys, no per-call cost. Use for voice generation, TTS, STT, transcription, voiceover, narration, dictation, audio asset work.

1,042 Updated today

first-fluke