← ClaudeAtlas

monitoringlisted

监控与告警
ryukyagamilight/terminal-skills · ★ 1 · DevOps & Infrastructure · score 75
Install: claude install-skill ryukyagamilight/terminal-skills
# 监控与告警 ## 概述 Prometheus、Grafana、告警规则配置等技能。 ## Prometheus ### 基础查询(PromQL) ```promql # 即时向量 http_requests_total http_requests_total{job="api", status="200"} # 范围向量 http_requests_total[5m] # 偏移 http_requests_total offset 1h # 聚合 sum(http_requests_total) sum by (job) (http_requests_total) sum without (instance) (http_requests_total) # 速率 rate(http_requests_total[5m]) irate(http_requests_total[5m]) # 增量 increase(http_requests_total[1h]) # 直方图分位数 histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) ``` ### 常用查询 ```promql # CPU 使用率 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # 内存使用率 (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 # 磁盘使用率 (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 # 网络流量 rate(node_network_receive_bytes_total[5m]) rate(node_network_transmit_bytes_total[5m]) # HTTP 请求速率 sum(rate(http_requests_total[5m])) by (status) # 错误率 sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) # 延迟 P99 histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) ``` ### 配置文件 ```yaml # prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 rule_files: - "rules/*.yml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node'