distributed-searchlisted
Install: claude install-skill proyecto26/system-design-skills
# Distributed search
Find the documents that best match a free-text query, ranked by relevance, fast,
across more data than one machine holds. Getting it wrong means either slow
`LIKE '%term%'` scans that melt the primary database, or a search box that
returns the wrong results and erodes user trust — both are silent until traffic
or corpus size exposes them.
## When to reach for this
Users type words and expect ranked, relevant matches — not exact-key lookups.
The corpus is text-heavy (documents, products, logs, messages), queries are
ad-hoc (any term, any combination), and results need ranking, highlighting,
facets, or typeahead. Reach for it when a `WHERE col LIKE` or full-table scan is
already the read bottleneck, or when you need fuzzy/partial matching a B-tree
index cannot serve.
## When NOT to
The access pattern is fetch-by-known-key or a fixed filter — a primary database
index serves that far more cheaply and consistently; keep it in `data-storage`.
The corpus is tiny (thousands of rows): an in-process filter or the database's
built-in full-text index is enough — a separate search cluster is pure
operational overhead (YAGNI). Search is a *derived, eventually-consistent* copy
of your data; never make it the system of record.
## Clarify first
- **Corpus size and growth** — document count, average doc size, total index
bytes? (→ `back-of-the-envelope`) This decides shard count.
- **Query QPS and shape** — read-heavy? term queries, phrase, fuzzy, facets,
autocompl