embedding-strategies

Solid

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

AI & Automation 2,279 stars 168 forks Updated 3 weeks ago Apache-2.0

Install

View on GitHub

Quality Score: 91/100

Stars 20%
100
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Embedding Strategies Guide to selecting and optimizing embedding models for vector search applications. ## When to Use This Skill - Choosing embedding models for RAG - Optimizing chunking strategies - Fine-tuning embeddings for domains - Comparing embedding model performance - Reducing embedding dimensions - Handling multilingual content ## Core Concepts ### 1. Embedding Model Comparison | Model | Dimensions | Max Tokens | Best For | |-------|------------|------------|----------| | **text-embedding-3-large** | 3072 | 8191 | High accuracy | | **text-embedding-3-small** | 1536 | 8191 | Cost-effective | | **voyage-2** | 1024 | 4000 | Code, legal | | **bge-large-en-v1.5** | 1024 | 512 | Open source | | **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight | | **multilingual-e5-large** | 1024 | 512 | Multi-language | ### 2. Embedding Pipeline ``` Document → Chunking → Preprocessing → Embedding Model → Vector ↓ [Overlap, Size] [Clean, Normalize] [API/Local] ``` ## Templates ### Template 1: OpenAI Embeddings ```python from openai import OpenAI from typing import List import numpy as np client = OpenAI() def get_embeddings( texts: List[str], model: str = "text-embedding-3-small", dimensions: int = None ) -> List[List[float]]: """Get embeddings from OpenAI.""" # Handle batching for large lists batch_size = 100 all_embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] ...

Details

Author
foryourhealth111-pixel
Repository
foryourhealth111-pixel/Vibe-Skills
Created
3 months ago
Last Updated
3 weeks ago
Language
Python
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category