distributed-searchlisted

This skill should be used when the user designs a "search system", needs "full-text search", asks about an "inverted index", "Elasticsearch / OpenSearch", "relevance ranking" (TF-IDF/BM25), "search autocomplete / typeahead", an "indexing pipeline", or "faceted search". It gives the crawl/index/search architecture, index sharding and replication, ranking, and near-real-time indexing. Use it whenever users must query text by relevance rather than fetch rows by key, even if they don't say "search engine".
proyecto26/system-design-skills · ★ 6 · Data & Documents · score 76

Install: claude install-skill proyecto26/system-design-skills

# Distributed search Find the documents that best match a free-text query, ranked by relevance, fast, across more data than one machine holds. Getting it wrong means either slow `LIKE '%term%'` scans that melt the primary database, or a search box that returns the wrong results and erodes user trust — both are silent until traffic or corpus size exposes them. ## When to reach for this Users type words and expect ranked, relevant matches — not exact-key lookups. The corpus is text-heavy (documents, products, logs, messages), queries are ad-hoc (any term, any combination), and results need ranking, highlighting, facets, or typeahead. Reach for it when a `WHERE col LIKE` or full-table scan is already the read bottleneck, or when you need fuzzy/partial matching a B-tree index cannot serve. ## When NOT to The access pattern is fetch-by-known-key or a fixed filter — a primary database index serves that far more cheaply and consistently; keep it in `data-storage`. The corpus is tiny (thousands of rows): an in-process filter or the database's built-in full-text index is enough — a separate search cluster is pure operational overhead (YAGNI). Search is a *derived, eventually-consistent* copy of your data; never make it the system of record. ## Clarify first - **Corpus size and growth** — document count, average doc size, total index bytes? (→ `back-of-the-envelope`) This decides shard count. - **Query QPS and shape** — read-heavy? term queries, phrase, fuzzy, facets, autocompl