audiomind

Solid

Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.

AI & Automation 25 stars 1 forks Updated 2 months ago

Install

View on GitHub

Quality Score: 72/100

Stars 20%
47
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
80
License 10%
0
Description 5%
100

Skill Content

# ๐ŸŽ™๏ธ AudioMind **Use when:** User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect. AudioMind is a smart audio dispatcher. It analyzes your request and routes it to the best available model โ€” ElevenLabs for speech and music, fal.ai for fast SFX โ€” and returns a ready-to-use audio URL. --- ## Quick Reference | Request Type | Best Model | Latency | |---|---|---| | Narrate text / Voice-over | `elevenlabs-tts-v3` | ~3s | | Low-latency TTS (real-time) | `elevenlabs-tts-turbo` | <1s | | Background music | `cassetteai-music` | ~15s | | Sound effect | `elevenlabs-sfx` | ~5s | | Clone a voice from audio | `elevenlabs-voice-clone` | ~10s | --- ## How to Use ### 1. Start the AudioMind server (once per session) ```bash bash {baseDir}/tools/start_server.sh ``` This starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation. ### 2. Route the request Analyze the user's request and call the appropriate tool via the MCP server: **Text-to-Speech (TTS)** When user asks to "narrate", "read aloud", "say", or "create a voice-over": ``` Use MCP tool: text_to_speech text: "<the text to narrate>" voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral) model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latency ``` **Music Generation** When user asks to "compose", "create background music", or "make a soundtrack": ``` Use MCP tool: text_to_sound_effects ...

Details

Author
wells1137
Repository
wells1137/media-skills
Created
2 months ago
Last Updated
2 months ago
Language
Python
License
None

Integrates with

Similar Skills

Semantically similar based on skill content โ€” not just same category

AI & Automation Featured

narrator-ai-cli

Create AI-narrated film/drama commentary videos via CLI. Two workflow paths (Original & Adapted narration), 93 movies, 146 BGM tracks, 63 dubbing voices in 11 languages, 90+ narration templates. Use when creating narration videos, film commentary, short drama dubbing, or video production.

667 Updated 1 months ago
GridLtd-ProductDev
AI & Automation Featured

video-podcast-maker

Use when user provides a topic and wants an automated video podcast created, OR when user wants to learn/analyze video design patterns from reference videos โ€” handles research, script writing, TTS audio synthesis, Remotion video creation, and final MP4 output with background music. Also supports design learning from reference videos (learn command), style profile management, and design reference library. Supports Bilibili, YouTube, Xiaohongshu, Douyin, and WeChat Channels platforms with independent language configuration (zh-CN, en-US).

1,034 Updated 2 days ago
Agents365-ai
AI & Automation Solid

image-studio

Tired of juggling 8 API keys? This skill gives you one-command access to Midjourney, Flux, Ideogram, and more, with zero setup. Use when you want to generate any image without worrying about API keys.

25 Updated 2 months ago
wells1137
AI & Automation Solid

server

Start/stop Kokoro TTS HTTP server. TRIGGERS - start tts server, kokoro server, tts http, stop tts server.

32 Updated 1 months ago
terrylica
AI & Automation Featured

aws-agentic-ai

AWS Bedrock AgentCore comprehensive expert for deploying and managing all AgentCore services. Use when working with Gateway, Runtime, Memory, Identity, or any AgentCore component. Covers MCP target deployment, credential management, schema optimization, runtime configuration, memory management, and identity services.

290 Updated 1 months ago
zxkane