nw-platform-engineering-foundations

Solid

Foundational platform engineering knowledge from key references -- Continuous Delivery, SRE, Accelerate, Team Topologies, Chaos Engineering, and Secure Delivery. Load when contextual grounding in platform engineering theory is needed.

AI & Automation 523 stars 54 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%
91
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Platform Engineering Foundations ## Continuous Delivery (Humble and Farley) Key principles: Build quality in | Work in small batches | Automate almost everything | Pursue continuous improvement | Everyone is responsible (shared ownership). Pipeline progression: Commit -> Acceptance -> Capacity -> Production stages. For detailed stage definitions and quality gates, see `cicd-and-deployment` skill. ## Site Reliability Engineering (Google -- Beyer et al.) Key principles: SLOs over SLAs (internal targets stricter than external) | Error budgets (balance reliability and velocity) | Toil elimination (automate repetitive manual work) | Embrace risk (calculate risk, do not eliminate it). Observability: Four Golden Signals (latency, traffic, errors, saturation) | SLI -> SLO -> Error Budget -> Alerting chain | Dashboards for investigation, not monitoring. ## Accelerate (Forsgren, Humble, Kim) ### DORA Metrics - **Deployment frequency**: how often code deploys to production - **Lead time for changes**: time from commit to production - **Change failure rate**: % of deployments causing failure - **Time to restore**: time to recover from production failure ### Performance Levels | Metric | Elite | High | |--------|-------|------| | Deployment frequency | Multiple times/day | Daily to weekly | | Lead time | < 1 hour | 1 day to 1 week | | Change failure rate | 0-15% | 16-30% | | Time to restore | < 1 hour | < 1 day | Use DORA metrics as baselines when assessing current state and ...

Details

Author
nWave-ai
Repository
nWave-ai/nWave
Created
3 months ago
Last Updated
1 weeks ago
Language
Python
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

platform-engineering

Provides platform engineering best practices for Internal Developer Platforms (IDPs), golden paths, service catalogs, and developer experience. Use when building developer platforms, configuring Backstage, designing self-service workflows, or when user mentions 'platform engineering', 'backstage', 'golden path', 'IDP', 'developer portal', 'service catalog', 'DevEx', 'platform team', 'self-service'.

62 Updated today
Tibsfox
Web & Frontend Listed

platform-engineering

Design and implement Internal Developer Platforms (IDPs) with self-service capabilities, golden paths, and developer experience optimization. Covers platform strategy, IDP architecture (Backstage, Port), infrastructure orchestration (Crossplane), GitOps (Argo CD), and adoption patterns. Use when building developer platforms, improving DevEx, or establishing platform teams.

368 Updated 5 months ago
ancoleman
DevOps & Infrastructure Solid

devops-sre-master

DevOps 与站点可靠性工程 (SRE) — 平台 / 基础设施 / 可靠性工程师的认知操作系统, 覆盖软件交付 + 运维全生命周期 (CI/CD 与发布工程 trunk-based + 渐进式发布 canary/blue-green/feature flag + GitOps Argo CD/Flux / 基础设施即代码 Terraform/OpenTofu/Pulumi/Ansible + policy-as-code OPA / 容器与编排 Docker/Kubernetes + Helm/Kustomize + service mesh Istio/Linkerd / 可观测性 Prometheus + Loki + OpenTelemetry + Honeycomb + eBPF + RED/USE / SLO-SLI-error budget 与可靠性工程 Google SRE 学科 + 容量规划 + 优雅降级 / 事件管理与 on-call 事件指挥 + PagerDuty + runbook + 无指责复盘 + MTTR / 云平台与 FinOps AWS/GCP/Azure + 成本优化 + 弹性伸缩 / 平台工程与开发者体验 IDP + Backstage + golden path + Team Topologies / DevSecOps 与供应链安全 shift-left + SBOM + SLSA + sigstore + Vault / 韧性与混沌工程 fault injection + game day + 安全科学 / DORA 指标与工程效能 部署频率 + 变更前置时间 + 变更失败率 + Accelerate 研究 / 数据库与有状态运维 schema 迁移 + 备份容灾) — 不含 通用应用开发 / 纯云销售认证速成 / 'DevOps = 跑 Jenkins 的岗位' 窄化误解 / ITIL 工单文化传统运维 (旧范式仅做边界) / 把手工运维 ClickOps 当稳态 (是 toil, 本 skill 核心反模式) (DevOps & Site Reliability Engineering — the cognitive operating system of platform / infrastructure / reliability practitioners

34 Updated 3 days ago
swaylq
AI & Automation Listed

chaos-engineering

Provides chaos engineering best practices for resilience testing, fault injection, and game day planning. Use when designing resilience experiments, configuring chaos tools, planning game days, or when user mentions 'chaos engineering', 'resilience', 'litmus', 'game day', 'fault injection', 'chaos monkey', 'blast radius', 'steady state', 'failure mode'.

62 Updated today
Tibsfox
DevOps & Infrastructure Listed

devops

DevOps practices, CI/CD, and infrastructure management

0 Updated today
murtazatouqeer