clip

Solid

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# CLIP - Contrastive Language-Image Pre-Training OpenAI's model that understands images from natural language. ## When to use CLIP **Use when:** - Zero-shot image classification (no training data needed) - Image-text similarity/matching - Semantic image search - Content moderation (detect NSFW, violence) - Visual question answering - Cross-modal retrieval (image→text, text→image) **Metrics**: - **25,300+ GitHub stars** - Trained on 400M image-text pairs - Matches ResNet-50 on ImageNet (zero-shot) - MIT License **Use alternatives instead**: - **BLIP-2**: Better captioning - **LLaVA**: Vision-language chat - **Segment Anything**: Image segmentation ## Quick start ### Installation ```bash pip install git+https://github.com/openai/CLIP.git pip install torch torchvision ftfy regex tqdm ``` ### Zero-shot classification ```python import torch import clip from PIL import Image # Load model device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device) # Load image image = preprocess(Image.open("photo.jpg")).unsqueeze(0).to(device) # Define possible labels text = clip.tokenize(["a dog", "a cat", "a bird", "a car"]).to(device) # Compute similarity with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) # Cosine similarity logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).cpu().numpy() # Print results labels ...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

OpenAI · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

clip

191,515 Updated today

NousResearch

AI & Automation Featured

clip

27,984 Updated today

davila7

AI & Automation Solid

blip-2-vision-language

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

9,609 Updated 1 months ago

Orchestra-Research