add-image-vision
SolidAdd image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.
AI & Automation 29,820 stars
12908 forks Updated today MIT
Install
Quality Score: 93/100
Stars 20%
Recency 20%
Frontmatter 20%
Documentation 15%
Issue Health 10%
License 10%
Description 5%
Skill Content
# Image Vision Skill
Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.
## Phase 1: Pre-flight
1. Check if `src/image.ts` exists — skip to Phase 3 if already applied
2. Confirm `sharp` is installable (native bindings require build tools)
**Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files.
## Phase 2: Apply Code Changes
### Ensure WhatsApp fork remote
```bash
git remote -v
```
If `whatsapp` is missing, add it:
```bash
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
```
### Merge the skill branch
```bash
git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
```
This merges in:
- `src/image.ts` (image download, resize via sharp, base64 encoding)
- `src/image.test.ts` (8 unit tests)
- Image attachment handling in `src/channels/whatsapp.ts`
- Image passing to agent in `src/index.ts` and `src/container-runner.ts`
- Image content block support in `container/agent-runner/src/index.ts`
- `sharp` npm dependency in `package.json`
If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides.
### Validate code changes
```bash
npm...
Details
- Author
- qwibitai
- Repository
- qwibitai/nanoclaw
- Created
- 4 months ago
- Last Updated
- today
- Language
- TypeScript
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
AI & Automation Listed
add-image-vision
Add image vision to Deus agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.
43 Updated today
sliamh11 AI & Automation Solid
add-voice-transcription
Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
29,820 Updated today
qwibitai AI & Automation Solid
add-whatsapp
Add WhatsApp as a channel. Can replace other channels entirely or run alongside them. Uses QR code or pairing code for authentication.
29,820 Updated today
qwibitai