ddia-systems

Featured

Design data systems by understanding storage engines, replication, partitioning, transactions, and consistency models. Use when the user mentions "database choice", "which database should I use", "SQL or NoSQL", "replication lag", "partitioning strategy", "consistency vs availability", "stream processing", "ACID transactions", "eventual consistency", "my queries are slow at scale", or "data is inconsistent across replicas". Also trigger when choosing a datastore, designing data pipelines, or debugging distributed-system consistency issues. Covers data models, batch/stream processing, and distributed consensus. For system design, see system-design. For resilience, see release-it.

AI & Automation 1,754 stars 179 forks Updated 5 days ago MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Designing Data-Intensive Applications Framework A principled approach to building reliable, scalable, and maintainable data systems. Apply these principles when choosing databases, designing schemas, architecting distributed systems, or reasoning about consistency and fault tolerance. ## Core Principle **Data outlives code.** Applications are rewritten and frameworks come and go, but data persists for decades -- prioritize the long-term correctness, durability, and evolvability of the data layer. Most applications are data-intensive, not compute-intensive: the hard problems are data volume, complexity, and rate of change, and explicit consistency/availability/latency trade-offs separate robust systems from fragile ones. ## Scoring **Goal: 10/10.** Score a data architecture by the seven Quick Diagnostic rows below: award ~1.4 points per row answered "yes" with evidence (deliberate, documented trade-off), 0 where the answer is "no" or unknown. - **9-10:** every domain choice -- data model, storage engine, replication, partitioning, isolation, derived-data, fault handling -- is deliberate, documented, and matched to actual read/write/consistency requirements; failover tested. - **5-6:** core choices made but two or three diagnostic rows fail -- e.g. default isolation level unknown, hot-key risk unhandled, or failover untested. - **<=3:** choices driven by familiarity, not requirements; ignored failure modes (replication lag, write skew, hot partitions) and accidental com...

Details

Author: wondelai
Repository: wondelai/skills
Created: 5 months ago
Last Updated: 5 days ago
Language: Shell
License: MIT

Bundled in these plugins

skills

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

ddia-principles

Designing Data-Intensive Applications (DDIA) distilled reference guide by Martin Kleppmann. MUST be loaded when: designing database schemas, choosing storage engines, implementing replication or partitioning, handling distributed transactions, building batch/stream processing pipelines, choosing consistency models, implementing consensus, designing data flow architectures, evaluating trade-offs between availability and consistency, encoding/serialization decisions, data modeling (relational vs document vs graph), building fault-tolerant systems, or any system design and architecture discussion involving data-intensive applications. Trigger on: database design, replication, partitioning, sharding, transactions, isolation levels, consistency, consensus, CAP theorem, batch processing, stream processing, MapReduce, Kafka, event sourcing, CDC, OLTP, OLAP, B-tree, LSM-tree, data warehouse, schema evolution, encoding formats, distributed systems, fault tolerance, leader election, quorum.

4 Updated yesterday

satbirbhbc-ux

AI & Automation Listed

principle-distributed-systems

Distributed systems principles — CAP, PACELC, consistency models (linearizable, causal, eventual, read-your-writes), consensus (Paxos, Raft), quorum, leader election, split-brain, replication, partitioning, gossip, logical clocks (Lamport, vector, hybrid), clock skew, delivery semantics (at-most-once, at-least-once, exactly-once effects), idempotency across nodes, two-generals problem, fallacies of distributed computing. Auto-load when reasoning about CAP/PACELC trade-offs, choosing a consistency model, designing consensus or leader election, sizing quorums, ordering events with logical clocks, distinguishing exactly-once delivery from exactly-once effects, designing replication or partitioning strategy, or assessing distributed failure modes.

3 Updated today

lugassawan

AI & Automation Listed

system-design

Design systems, services, and architectures. Trigger with "design a system for", "how should we architect", "system design for", "what's the right architecture for", or when the user needs help with API design, data modeling, or service boundaries.

0 Updated 1 weeks ago

rabius-sunny