analyze-videos-with-frame-extraction-and-audio-context-in-claude

Solid

Give Claude Code a video perception layer that extracts frames, transcribes audio, and lets Claude answer questions about local videos or YouTube URLs.

AI & Automation 19 stars 14 forks Updated today MIT

Install

View on GitHub

Quality Score: 81/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Analyze videos with frame extraction and audio context in Claude Code Give Claude Code a video perception layer that extracts frames, transcribes audio, and lets Claude answer questions about local videos or YouTube URLs. ## Prerequisites Claude Code, Node.js 20+, ffmpeg, optional yt-dlp, Gemini API or Whisper/OpenAI audio backend ## Installation Use the upstream install or setup path that matches your environment: - git clone https://github.com/jordanrendric/claude-video-vision.git Requirements and caveats from upstream: - | **Local (Whisper)** | whisper.cpp or Python openai-whisper | Free, fully offline | brew install whisper-cpp + auto model download | - │ MCP Server (Node.js) │ - **Node.js 20+** (for the MCP server) Basic usage or getting-started notes: - ### 1. Install the plugin - Inside Claude Code, run these commands **one at a time**: - /plugin marketplace add https://github.com/jordanrendric/claude-video-vision - Source: https://github.com/jordanrendric/claude-video-vision - Extracted from upstream docs: https://raw.githubusercontent.com/jordanrendric/claude-video-vision/HEAD/README.md ## Documentation - https://github.com/jordanrendric/claude-video-vision ## Source - [Agent Skill Exchange](https://agentskillexchange.com/skills/analyze-videos-with-frame-extraction-and-audio-context-in-claude-code/)

Details

Author: agentskillexchange
Repository: agentskillexchange/skills
Created: 4 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI LangChain · AI Model Context Protocol · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

claude-real-video

Watch a video for the user. Use when the user shares a video URL (YouTube etc.) or local video file and wants it summarized, analyzed, or discussed — Claude can't ingest video directly, so this skill extracts scene-aware keyframes + transcript first, then reads those.

1,867 Updated yesterday

HUANGCHIHHUNGLeo

Data & Documents Listed

video-analysis

Understand what is actually in a video, fully locally and for free. Use this skill whenever the user wants to summarise, describe, transcribe, caption, chapter, or answer questions about a video or its audio — "what happens in this video", "summarise this screen recording", "transcribe this meeting", "what does the speaker say", "find the moment X happens", "read the text on screen", "turn this call into notes", "describe this clip for accessibility", "make chapters", "who says what". It samples timestamped frames with ffmpeg, transcribes speech with a local Whisper backend (whisper.cpp / whisperx / faster-whisper / openai-whisper), then reads the frames and transcript to answer — no cloud APIs, no keys, no per-minute cost, nothing leaves the machine. This is the understanding counterpart to ffmpeg-workbench, which transforms/encodes media (convert, compress, clip, GIF, burn subtitles). If the user wants to CHANGE a file rather than understand it, use ffmpeg-workbench instead.

0 Updated 1 weeks ago

SalZaki

AI & Automation Featured

claude-real-video-for-agents

Install and use crv (claude-real-video) — a tool that lets any AI agent watch videos by extracting scene-aware keyframes, deduplicating them, and transcribing audio. Use when the user shares a video URL or file and wants it analyzed, summarized, or discussed.

1,867 Updated yesterday

HUANGCHIHHUNGLeo