← ClaudeAtlas

media-processorlisted

Process multimedia content — audio transcription, video analysis, PDF data extraction, image generation. Use for deeper image analysis when implementing from UI designs, analyzing charts for data, reading dense screenshots, or studying artworks and visual references.
avibebuilder/claude-prime · ★ 64 · Data & Documents · score 82
Install: claude install-skill avibebuilder/claude-prime
# Media Processor Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation. ## Core Capabilities ### Image Understanding - Image captioning and description - Object detection with bounding boxes (2.0+) - Pixel-level segmentation (2.5+) - Visual question answering - Multi-image comparison (up to 3,600 images) - OCR and text extraction ### Video Analysis - Scene detection and summarization - Video Q&A with temporal understanding - Transcription with visual descriptions - YouTube URL support - Long video processing (up to 6 hours) - Frame-level analysis ### Document Extraction - Native PDF vision processing (up to 1,000 pages) - Table and form extraction - Chart and diagram analysis - Multi-page document understanding - Structured data output (JSON schema) - Format conversion (PDF to HTML/JSON) ### Image Generation - Text-to-image generation - High-quality generation variant (`generate-hq`) for detailed outputs - Image editing and modification - Multi-image composition (up to 3 images) - Iterative refinement - Multiple aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4) - Controllable style and quality ### Audio Processing - Transcription with timestamps (up to 9.5 hours) - Audio summarization and analysis - Speech understanding and speaker identification - Music and environmental sound analysis - Text-to-speech generation with controllable voice ## Supported Tasks