computer-use-agents

Solid

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Computer Use Agents ## Patterns ### Perception-Reasoning-Action Loop The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline. Key components: 1. PERCEPTION: Screenshot captures current screen state 2. REASONING: Vision-language model analyzes and plans 3. ACTION: Execute mouse/keyboard operations 4. FEEDBACK: Observe result, continue or correct Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern. **When to use**: ['Building any computer use agent from scratch', 'Integrating vision models with desktop control', 'Understanding agent behavior patterns'] ```python from anthropic import Anthropic from PIL import Image import base64 import pyautogui import time class ComputerUseAgent: """ Perception-Reasoning-Action loop implementation. Based on Anthropic Computer Use patterns. """ def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"): self.client = client self.model = model self.max_steps = 50 # Prevent runaway loops self.action_delay = 0.5 # Seconds between actions def capture_screenshot(self) -> str: """Capture screen and return base64 encoded image.""" screenshot = pyautogui.screenshot() # Resize for token efficiency (1280x800 is good bal...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category