optimize-deliverylisted

Platform/superadmin workflow for hill-climbing voice-lesson DELIVERY quality. Reads per-lesson delivery config, responsiveness telemetry, transcripts, and an LLM quality judge to find what to improve and to compare experiment arms. Not a teacher tool.
beau-education/beau-plugin · ★ 1 · AI & Automation · score 65

Install: claude install-skill beau-education/beau-plugin

# Optimize Delivery Skill This skill drives the **lesson-delivery optimization loop** for the Beau platform: measure how responsive/clear voice lessons are, tie each lesson to the exact config (model, voice, turn-detection, prompt version, git commit) that produced it, surface soft signals of trouble (student confusion, mistimed/missing media, pacing), and compare configurations over time. **Audience: platform operators (superadmin) only.** This is about *us* improving the tutor's delivery — not a teacher evaluating a student (that's `evaluate-student`). Every tool here is **cross-org and superadmin-gated**: a non-superadmin session gets 403. ## Key concepts - **Attempt = one lesson run**, identified by its `courseProgress` id. Everything joins on this id. - **Stamp**: each attempt records its `deliveryConfig` (resolved model/voice/turn-detection/`promptTemplateVersion`, the actual `botId`/`botVoice`, and the `gitCommit` of the prompt-assembly code) plus an `experimentArm` (today a single `none:default`; Phase 2 will vary arms). - **Primary metric = first-audio latency**: ms from the server VAD marking the student's turn over (`speech_stopped`) to the bot's first audio. It **excludes** the fixed VAD silence window (~1500 ms) — it measures server+model+network responsiveness, not the full human-perceived gap. - **Guardrails**: interruption rate, nudge count, cost, completion, quiz accuracy. - **Test runs count.** Teacher test runs are captured (`isTest=true`) and are valid