ffmpeg-audiolisted

The prod audio convention made declarative — coerce any container (.m4a/.mp3/.webm/.wav) to mono 16 kHz signed-16-bit PCM WAV (`-ac 1 -ar 16000 -c:a pcm_s16le`), the exact input shape TitaNet/voiceprint, NeMo diarization, whisper, and the training corpus all expect. Four ops (normalize · trim-to-clip · concat · probe) plus a YAML reconcile mode. Idempotent (skips up-to-date outputs), dry-run by default (nothing runs without --apply). Use when Ian says "normalize this audio", "make a 16k wav", "extract an enroll clip", "trim that recording", "convert to mono 16k", "prep audio for whisper/diarization/voiceprint", "batch-convert these recordings", or any time audio needs the prod mono-16k-PCM shape.
iansteitz1-eng/aria-skills · ★ 0 · AI & Automation · score 62

Install: claude install-skill iansteitz1-eng/aria-skills

# ffmpeg-audio One ffmpeg shape recurs across the voiceprint, diarization, whisper, and training lanes: **mono · 16 kHz · signed-16-bit PCM WAV** — `ffmpeg -ac 1 -ar 16000 -c:a pcm_s16le`. This skill is that convention as a single idempotent CLI, so the same bytes come out every time regardless of who runs it or what container went in. **Defaults are the convention** (`SR=16000`, `CH=1`, `CODEC=pcm_s16le`). Change them in one place at the top of `ffmpeg_audio.py` if a lane ever needs a different target. ## Safety model - **Dry-run by default.** Every op prints the exact `ffmpeg` command it *would* run and changes nothing until `--apply`. - **Idempotent.** An output that already exists and is newer than its input(s) is **skipped** — re-running a batch is free. `--force` rebuilds anyway. - **Read-only `probe`** never writes. ## The four ops 1. **normalize** — any audio → mono 16k PCM WAV (the canonical convention). ```sh python3 ~/.claude/skills/ffmpeg-audio/ffmpeg_audio.py normalize output.m4a -o ian.wav --apply ``` Batch a roster with globs + `--suffix` (no `-o`): ```sh python3 .../ffmpeg_audio.py normalize ian.* brandon.* stephen.* --suffix .16k.wav --apply ``` 2. **trim** — slice `--start`/`--duration` then normalize (the enroll-clip pattern: `superwhisper/recordings/<ts>/output.wav` → `~/enroll/ian.wav`, 22s). ```sh python3 .../ffmpeg_audio.py trim output.wav --start 0 --duration 22 -o ~/enroll/ian.wav --apply ``` 3. **concat**