nw-platform-engineering-foundations

Solid

Foundational platform engineering knowledge from key references -- Continuous Delivery, SRE, Accelerate, Team Topologies, Chaos Engineering, and Secure Delivery. Load when contextual grounding in platform engineering theory is needed.

AI & Automation 523 stars 54 forks Updated 1 weeks ago MIT

Install

Quality Score: 92/100

Stars 20%

91

Recency 20%

90

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# Platform Engineering Foundations ## Continuous Delivery (Humble and Farley) Key principles: Build quality in | Work in small batches | Automate almost everything | Pursue continuous improvement | Everyone is responsible (shared ownership). Pipeline progression: Commit -> Acceptance -> Capacity -> Production stages. For detailed stage definitions and quality gates, see `cicd-and-deployment` skill. ## Site Reliability Engineering (Google -- Beyer et al.) Key principles: SLOs over SLAs (internal targets stricter than external) | Error budgets (balance reliability and velocity) | Toil elimination (automate repetitive manual work) | Embrace risk (calculate risk, do not eliminate it). Observability: Four Golden Signals (latency, traffic, errors, saturation) | SLI -> SLO -> Error Budget -> Alerting chain | Dashboards for investigation, not monitoring. ## Accelerate (Forsgren, Humble, Kim) ### DORA Metrics - **Deployment frequency**: how often code deploys to production - **Lead time for changes**: time from commit to production - **Change failure rate**: % of deployments causing failure - **Time to restore**: time to recover from production failure ### Performance Levels | Metric | Elite | High | |--------|-------|------| | Deployment frequency | Multiple times/day | Daily to weekly | | Lead time | < 1 hour | 1 day to 1 week | | Change failure rate | 0-15% | 16-30% | | Time to restore | < 1 hour | < 1 day | Use DORA metrics as baselines when assessing current state and ...

Details

Author: nWave-ai
Repository: nWave-ai/nWave
Created: 3 months ago
Last Updated: 1 weeks ago
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

platform-engineering

Provides platform engineering best practices for Internal Developer Platforms (IDPs), golden paths, service catalogs, and developer experience. Use when building developer platforms, configuring Backstage, designing self-service workflows, or when user mentions 'platform engineering', 'backstage', 'golden path', 'IDP', 'developer portal', 'service catalog', 'DevEx', 'platform team', 'self-service'.

62 Updated today

Web & Frontend Listed

platform-engineering

Design and implement Internal Developer Platforms (IDPs) with self-service capabilities, golden paths, and developer experience optimization. Covers platform strategy, IDP architecture (Backstage, Port), infrastructure orchestration (Crossplane), GitOps (Argo CD), and adoption patterns. Use when building developer platforms, improving DevEx, or establishing platform teams.

368 Updated 5 months ago

DevOps & Infrastructure Solid

devops-sre-master

DevOps 与站点可靠性工程 (SRE) — 平台 / 基础设施 / 可靠性工程师的认知操作系统, 覆盖软件交付 + 运维全生命周期 (CI/CD 与发布工程 trunk-based + 渐进式发布 canary/blue-green/feature flag + GitOps Argo CD/Flux / 基础设施即代码 Terraform/OpenTofu/Pulumi/Ansible + policy-as-code OPA / 容器与编排 Docker/Kubernetes + Helm/Kustomize + service mesh Istio/Linkerd / 可观测性 Prometheus + Loki + OpenTelemetry + Honeycomb + eBPF + RED/USE / SLO-SLI-error budget 与可靠性工程 Google SRE 学科 + 容量规划 + 优雅降级 / 事件管理与 on-call 事件指挥 + PagerDuty + runbook + 无指责复盘 + MTTR / 云平台与 FinOps AWS/GCP/Azure + 成本优化 + 弹性伸缩 / 平台工程与开发者体验 IDP + Backstage + golden path + Team Topologies / DevSecOps 与供应链安全 shift-left + SBOM + SLSA + sigstore + Vault / 韧性与混沌工程 fault injection + game day + 安全科学 / DORA 指标与工程效能部署频率 + 变更前置时间 + 变更失败率 + Accelerate 研究 / 数据库与有状态运维 schema 迁移 + 备份容灾) — 不含通用应用开发 / 纯云销售认证速成 / 'DevOps = 跑 Jenkins 的岗位' 窄化误解 / ITIL 工单文化传统运维 (旧范式仅做边界) / 把手工运维 ClickOps 当稳态 (是 toil, 本 skill 核心反模式) (DevOps & Site Reliability Engineering — the cognitive operating system of platform / infrastructure / reliability practitioners

34 Updated 3 days ago

AI & Automation Listed

chaos-engineering

Provides chaos engineering best practices for resilience testing, fault injection, and game day planning. Use when designing resilience experiments, configuring chaos tools, planning game days, or when user mentions 'chaos engineering', 'resilience', 'litmus', 'game day', 'fault injection', 'chaos monkey', 'blast radius', 'steady state', 'failure mode'.

62 Updated today

DevOps & Infrastructure Listed

devops

DevOps practices, CI/CD, and infrastructure management

0 Updated today