Grafana
MonitoringCommonly used with
Skills using Grafana (158)
grafana-dashboards
Create and manage production-ready Grafana dashboards for comprehensive system observability.
observability-engineer
Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows.
prometheus-configuration
Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.
adobe-incident-runbook
Execute Adobe incident response procedures with triage, mitigation, and postmortem for Firefly Services, PDF Services, and I/O Events outages. Use when responding to Adobe-related incidents, investigating API failures, or running post-incident reviews. Trigger with phrases like "adobe incident", "adobe outage", "adobe down", "adobe on-call", "adobe emergency".
adobe-observability
Set up comprehensive observability for Adobe API integrations with Prometheus metrics, OpenTelemetry traces, structured logging, and alert rules covering Firefly, PDF Services, and Photoshop APIs. Trigger with phrases like "adobe monitoring", "adobe metrics", "adobe observability", "monitor adobe", "adobe alerts", "adobe tracing".
algolia-observability
Set up observability for Algolia: Prometheus metrics for search latency/errors, OpenTelemetry tracing, structured logging, and Grafana dashboards. Trigger: "algolia monitoring", "algolia metrics", "algolia observability", "monitor algolia", "algolia alerts", "algolia tracing", "algolia dashboard".
castai-reference-architecture
CAST AI reference architecture for multi-cluster Kubernetes cost optimization. Use when designing CAST AI deployment across environments, planning Terraform module structure, or establishing team standards. Trigger with phrases like "cast ai architecture", "cast ai best practices", "cast ai multi-cluster", "cast ai terraform structure".
clay-observability
Monitor Clay enrichment pipeline health, credit consumption, and data quality metrics. Use when setting up dashboards for Clay operations, configuring alerts for credit burn, or tracking enrichment success rates. Trigger with phrases like "clay monitoring", "clay metrics", "clay observability", "monitor clay", "clay alerts", "clay dashboard", "clay credit tracking".
clickhouse-observability
Monitor ClickHouse with Prometheus metrics, Grafana dashboards, system table queries, and alerting for query performance, merge health, and resource usage. Use when setting up ClickHouse monitoring, building Grafana dashboards, or configuring alerts for production ClickHouse deployments. Trigger: "clickhouse monitoring", "clickhouse metrics", "clickhouse Grafana", "clickhouse observability", "monitor clickhouse", "clickhouse Prometheus".
clickup-observability
Monitor ClickUp API integrations with metrics, tracing, structured logging, and alerting using Prometheus, OpenTelemetry, and Grafana. Trigger: "clickup monitoring", "clickup metrics", "clickup observability", "monitor clickup", "clickup alerts", "clickup tracing", "clickup dashboard".
cohere-incident-runbook
Execute Cohere incident response procedures with triage, mitigation, and postmortem. Use when responding to Cohere API outages, investigating errors, or running post-incident reviews for Cohere integration failures. Trigger with phrases like "cohere incident", "cohere outage", "cohere down", "cohere on-call", "cohere emergency", "cohere broken".
coreweave-observability
Set up GPU monitoring and observability for CoreWeave workloads. Use when implementing GPU metrics dashboards, configuring alerts, or tracking inference latency and throughput. Trigger with phrases like "coreweave monitoring", "coreweave observability", "coreweave gpu metrics", "coreweave grafana".
coreweave-reference-architecture
Reference architecture for CoreWeave GPU cloud deployments. Use when designing ML infrastructure, planning multi-model serving, or establishing CoreWeave deployment standards. Trigger with phrases like "coreweave architecture", "coreweave design", "coreweave infrastructure", "coreweave best practices".
customerio-observability
Set up Customer.io monitoring and observability. Use when implementing metrics, structured logging, alerting, or Grafana dashboards for Customer.io integrations. Trigger: "customer.io monitoring", "customer.io metrics", "customer.io dashboard", "customer.io alerts", "customer.io observability".
deepgram-observability
Set up comprehensive observability for Deepgram integrations. Use when implementing monitoring, setting up dashboards, or configuring alerting for Deepgram integration health. Trigger: "deepgram monitoring", "deepgram metrics", "deepgram observability", "monitor deepgram", "deepgram alerts", "deepgram dashboard".
documenso-observability
Implement monitoring, logging, and tracing for Documenso integrations. Use when setting up observability, implementing metrics collection, or debugging production issues. Trigger with phrases like "documenso monitoring", "documenso metrics", "documenso logging", "documenso tracing", "documenso observability".
fireflies-observability
Monitor Fireflies.ai integration health with metrics, alerts, and dashboards. Use when implementing monitoring, setting up alerting, or tracking transcript processing reliability. Trigger with phrases like "fireflies monitoring", "fireflies metrics", "fireflies observability", "monitor fireflies", "fireflies alerts".
flexport-observability
Set up observability for Flexport logistics integrations with metrics, structured logging, distributed tracing, and alerting dashboards. Trigger: "flexport monitoring", "flexport observability", "flexport metrics", "flexport alerts".
gamma-observability
Implement comprehensive observability for Gamma integrations. Use when setting up monitoring, logging, tracing, or building dashboards for Gamma API usage. Trigger with phrases like "gamma monitoring", "gamma logging", "gamma metrics", "gamma observability", "gamma dashboard".
intercom-observability
Set up observability for Intercom integrations with metrics, traces, and alerts. Use when implementing monitoring for Intercom API operations, setting up dashboards, or configuring alerting for integration health. Trigger with phrases like "intercom monitoring", "intercom metrics", "intercom observability", "monitor intercom", "intercom alerts", "intercom tracing".
klaviyo-observability
Set up observability for Klaviyo integrations with metrics, traces, and alerts. Use when implementing monitoring for Klaviyo API operations, setting up dashboards, or configuring alerting for Klaviyo integration health. Trigger with phrases like "klaviyo monitoring", "klaviyo metrics", "klaviyo observability", "monitor klaviyo", "klaviyo alerts", "klaviyo tracing".
langchain-observability
Set up comprehensive observability for LangChain applications with LangSmith tracing, OpenTelemetry, Prometheus metrics, and alerts. Trigger: "langchain monitoring", "langchain metrics", "langchain observability", "langchain tracing", "LangSmith", "langchain alerts".
langfuse-observability
Set up comprehensive observability for Langfuse with metrics, dashboards, and alerts. Use when implementing monitoring for LLM operations, setting up dashboards, or configuring alerting for Langfuse integration health. Trigger with phrases like "langfuse monitoring", "langfuse metrics", "langfuse observability", "monitor langfuse", "langfuse alerts", "langfuse dashboard".
lindy-observability
Monitor Lindy AI agent health, task success rates, and credit consumption. Use when setting up monitoring, building dashboards, configuring alerts, or tracking agent performance over time. Trigger with phrases like "lindy monitoring", "lindy observability", "lindy metrics", "lindy logging", "lindy dashboard".
linear-observability
Implement monitoring, logging, and alerting for Linear integrations. Use when setting up metrics collection, dashboards, or configuring alerts for Linear API usage. Trigger: "linear monitoring", "linear observability", "linear metrics", "linear logging", "monitor linear", "linear Prometheus", "linear Grafana".
linktree-performance-tuning
Optimize Linktree API integration performance with caching, batching, and rate limit strategies. Use when Linktree API calls are slow, hitting rate limits, or profile pages serve stale link data. Trigger with "linktree performance tuning".
linktree-prod-checklist
Prod Checklist for Linktree. Trigger: "linktree prod checklist".
load-testing-apis
Execute comprehensive load and stress testing to validate API performance and scalability. Use when validating API performance under load. Trigger with phrases like "load test the API", "stress test API", or "benchmark API performance".
logging-api-requests
Monitor and log API requests with correlation IDs, performance metrics, and security audit trails. Use when auditing API requests and responses. Trigger with phrases like "log API requests", "add API logging", or "track API calls".
lucidchart-prod-checklist
Prod Checklist for Lucidchart. Trigger: "lucidchart prod checklist".
maintainx-observability
Implement comprehensive observability for MaintainX integrations. Use when setting up monitoring, logging, tracing, and alerting for MaintainX API integrations. Trigger with phrases like "maintainx monitoring", "maintainx logging", "maintainx metrics", "maintainx observability", "maintainx alerts".
mindtickle-prod-checklist
Prod Checklist for MindTickle. Trigger: "mindtickle prod checklist".
miro-observability
Set up observability for Miro REST API v2 integrations with Prometheus metrics, OpenTelemetry traces, structured logging, and Grafana dashboards. Trigger with phrases like "miro monitoring", "miro metrics", "miro observability", "monitor miro", "miro alerts", "miro tracing".
mistral-observability
Set up comprehensive observability for Mistral AI with metrics, traces, and alerts. Use when implementing monitoring for Mistral AI operations, setting up dashboards, or configuring alerting for integration health. Trigger with phrases like "mistral monitoring", "mistral metrics", "mistral observability", "monitor mistral", "mistral alerts".
monitoring-apis
Build real-time API monitoring dashboards with metrics, alerts, and health checks. Use when tracking API health and performance metrics. Trigger with phrases like "monitor the API", "add API metrics", or "setup API monitoring".
navan-observability
Use when setting up monitoring, logging, and alerting for Navan API integrations in production environments. Trigger with "navan observability" or "navan monitoring" or "navan api dashboards".
notion-observability
Set up observability for Notion integrations with metrics, traces, and alerts. Use when implementing monitoring for Notion API calls, setting up dashboards, or configuring alerting for Notion integration health. Trigger with phrases like "notion monitoring", "notion metrics", "notion observability", "monitor notion", "notion alerts", "notion tracing".
palantir-observability
Set up observability for Palantir Foundry integrations with metrics, logging, and alerts. Use when implementing monitoring for Foundry API calls, setting up dashboards, or configuring alerting for Foundry integration health. Trigger with phrases like "palantir monitoring", "foundry metrics", "palantir observability", "monitor foundry", "foundry alerts".
posthog-observability
Monitor PostHog integration health: event ingestion rates, feature flag evaluation latency, billing volume tracking, and Prometheus/Grafana alerting. Trigger: "posthog monitoring", "posthog metrics", "posthog observability", "monitor posthog", "posthog alerts", "posthog dashboard".
running-chaos-tests
Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".
running-performance-tests
Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".
salesforce-observability
Set up observability for Salesforce integrations with API limit monitoring, error tracking, and alerting. Use when implementing monitoring for Salesforce operations, tracking API consumption, or configuring alerting for Salesforce integration health. Trigger with phrases like "salesforce monitoring", "salesforce metrics", "salesforce observability", "monitor salesforce", "salesforce alerts", "salesforce API usage dashboard".
sentry-observability
Integrate Sentry with your observability stack — logging, metrics, APM, and dashboards. Use when connecting Sentry to winston/pino/structlog, correlating errors with business metrics, deciding between Sentry performance and Datadog/New Relic, building Sentry Discover dashboards, or linking events to external tools via extra context. Trigger: "sentry observability", "sentry logging", "sentry metrics", "sentry grafana", "sentry datadog correlation", "sentry discover dashboard".
snowflake-observability
Set up Snowflake observability using ACCOUNT_USAGE views, alerts, and external monitoring. Use when implementing Snowflake monitoring dashboards, setting up query performance tracking, or configuring alerting for warehouse and pipeline health. Trigger with phrases like "snowflake monitoring", "snowflake metrics", "snowflake observability", "snowflake dashboard", "snowflake alerts".
vastai-observability
Monitor Vast.ai GPU instance health, utilization, and costs. Use when setting up monitoring dashboards, configuring alerts, or tracking GPU utilization and spending. Trigger with phrases like "vastai monitoring", "vastai metrics", "vastai observability", "monitor vastai", "vastai alerts".
webflow-observability
Set up observability for Webflow integrations — Prometheus metrics for API calls, OpenTelemetry tracing, structured logging with pino, Grafana dashboards, and alerting for rate limits, errors, and latency. Trigger with phrases like "webflow monitoring", "webflow metrics", "webflow observability", "monitor webflow", "webflow alerts", "webflow tracing".
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
devops-troubleshooter
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.
grafana-dashboards
Create and manage production-ready Grafana dashboards for comprehensive system observability.
observability-engineer
Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows.
prometheus-configuration
Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.
golang-observability
Golang everyday observability — the always-on signals in production. Covers structured logging with slog, Prometheus metrics, OpenTelemetry distributed tracing, continuous profiling with pprof/Pyroscope, server-side RUM event tracking, alerting, and Grafana dashboards. Apply when instrumenting Go services for production monitoring, setting up metrics or alerting, adding OpenTelemetry tracing, correlating logs with traces, migrating legacy loggers (zap/logrus/zerolog) to slog, adding observability to new features, or implementing GDPR/CCPA-compliant tracking with Customer Data Platforms (CDP). Not for temporary deep-dive performance investigation (→ See golang-benchmark and golang-performance skills).
grafana-dashboard-creator
Create grafana dashboard creator operations. Auto-activating skill for DevOps Advanced. Triggers on: grafana dashboard creator, grafana dashboard creator Part of the DevOps Advanced skill category. Use when working with grafana dashboard creator functionality. Trigger with phrases like "grafana dashboard creator", "grafana creator", "grafana".
building-incident-response-dashboard
Builds real-time incident response dashboards in Splunk, Elastic, or Grafana to provide SOC analysts and leadership with situational awareness during active incidents, tracking affected systems, containment status, IOC spread, and response timeline. Use when IR teams need unified visibility during incident coordination and post-incident reporting.
building-soc-metrics-and-kpi-tracking
Builds SOC performance metrics and KPI tracking dashboards measuring Mean Time to Detect (MTTD), Mean Time to Respond (MTTR), alert quality ratios, analyst productivity, and detection coverage using SIEM data. Use when SOC leadership needs operational visibility, continuous improvement tracking, or executive-level reporting on security operations effectiveness.
building-vulnerability-aging-and-sla-tracking
Implement a vulnerability aging dashboard and SLA tracking system to measure remediation performance against severity-based timelines and drive accountability.
implementing-api-abuse-detection-with-rate-limiting
Implement API abuse detection using token bucket, sliding window, and adaptive rate limiting algorithms to prevent DDoS, brute force, and credential stuffing attacks.
dashboard-generator
Generate monitoring dashboards for Grafana and DataDog with alert integration
metrics-schema-generator
Generate metrics schemas for Prometheus, OpenTelemetry, and Grafana dashboards
prometheus-grafana
Expert skill for Prometheus metrics and Grafana dashboards. Write and validate PromQL queries, generate Grafana dashboard JSON, create alerting and recording rules, analyze metric cardinality, and debug scrape configurations.
service-mesh
Service mesh configuration and operations expertise for Istio, Linkerd, and Consul Connect
dashboard-builder
Build monitoring dashboards that answer real operator questions for Grafana, SigNoz, and similar platforms. Use when turning metrics into a working dashboard instead of a vanity board.
azure-kubernetes
Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security, and operations (autoscaling, upgrade strategy, cost analysis). WHEN: create AKS environment, provision AKS environment, enable AKS observability, design AKS networking, choose AKS SKU, secure AKS.
azure-kubernetes
Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security, and operations (autoscaling, upgrade strategy, cost analysis). WHEN: create AKS environment, provision AKS environment, enable AKS observability, design AKS networking, choose AKS SKU, secure AKS.
monitoring-expert
Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.
k6-performance-testing
k6 load testing expertise for performance validation and analysis
opentelemetry-integrator
Integrate OpenTelemetry tracing and metrics into SDKs
oma-observability
Intent-based observability + traceability router across layers, boundaries, and signals. Routes to vendor-specific skills via category taxonomy; owns transport tuning, meta-observability, incident forensics. Use for observability, traceability, telemetry, APM, RUM, metrics, logs, traces, profiles, SLO, incident forensics, tracing architecture work.
creating-apm-dashboards
This skill enables Claude to create Application Performance Monitoring (APM) dashboards. It is triggered when the user requests the creation of a new APM dashboard, monitoring dashboard, or a dashboard for application performance. The skill helps define key metrics and visualizations for monitoring application health, performance, and user experience across multiple platforms like Grafana and Datadog. Use this skill when the user needs assistance setting up a new monitoring solution or expanding an existing one. The plugin supports the creation of dashboards focusing on golden signals, request metrics, resource utilization, database metrics, cache metrics, business metrics, and error tracking.
deploying-monitoring-stacks
This skill deploys monitoring stacks, including Prometheus, Grafana, and Datadog. It is used when the user needs to set up or configure monitoring infrastructure for applications or systems. The skill generates production-ready configurations, implements best practices, and supports multi-platform deployments. Use this when the user explicitly requests to deploy a monitoring stack, or mentions Prometheus, Grafana, or Datadog in the context of infrastructure setup.
dashboard-brief
Convert a business question into a complete dashboard specification. Use when asked to design a dashboard, create a dashboard spec or brief, plan a BI report, or define what charts and metrics a dashboard should include. Produces a structured spec with metrics, dimensions, chart types, filters, and layout guidance.
grafana-dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
prometheus-configuration
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
azure-iot-operations
Expert knowledge for Azure IoT Operations development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when configuring MQTT broker, data flows/graphs, OPC UA/ONVIF connectors, WASM transforms, or Prometheus/Grafana, and other Azure IoT Operations related development tasks. Not for Azure IoT (use azure-iot), Azure IoT Hub (use azure-iot-hub), Azure IoT Edge (use azure-iot-edge), Azure Defender For Iot (use azure-defender-for-iot).
azure-managed-grafana
Expert knowledge for Azure Managed Grafana development including troubleshooting, decision making, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when integrating Azure Monitor/Prometheus, configuring data sources/alerts, Entra auth, private endpoints, or HA workspaces, and other Azure Managed Grafana related development tasks. Not for Azure Monitor (use azure-monitor), Azure Application Gateway (use azure-application-gateway), Azure Virtual Machines (use azure-virtual-machines), Azure Kubernetes Service (AKS) (use azure-kubernetes-service).
azure-monitor
Expert knowledge for Azure Monitor development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when working with Log Analytics workspaces, DCRs, AMA/agents, Application Insights, or Prometheus/Container Insights, and other Azure Monitor related development tasks. Not for Azure Managed Grafana (use azure-managed-grafana), Azure Network Watcher (use azure-network-watcher), Azure Service Health (use azure-service-health), Azure Defender For Cloud (use azure-defender-for-cloud).
otel-queries
Analyze gh-aw OpenTelemetry traces from JSONL mirrors or OTLP backends.
devops
DevOps - Docker, CI/CD, cloud infra, monitoring.
observability
Structured logging with Pino/Winston, OpenTelemetry tracing, metrics collection, Grafana dashboards, and alerting rules.
automating-devops
DevOps knowledge reference covering Git workflows, testing strategies, DevSecOps, release pipeline orchestration (release.yml, multi-arch images, cosign integration), CI/CD pipelines, database management, observability, and performance optimization. Use when working with Git, CI/CD, release pipelines, ghcr image publishing, testing, monitoring, or infrastructure automation.
observability-sre
Observability and SRE expert. Use when setting up monitoring, logging, tracing, defining SLOs, or managing incidents. Covers Prometheus, Grafana, OpenTelemetry, and incident response best practices.
ops-monitor
Unified APM and monitoring surface. Polls Datadog, New Relic, and OpenTelemetry backends for active alerts, error traces, and entity health. Use --watch for live polling every 60 seconds. Use --setup to configure monitoring credentials.
hunt-csrf
Hunting skill for csrf vulnerabilities. Built from 15 public bug bounty reports including modern variants — SameSite=Lax sibling-subdomain bypass (Argo CD CVE-2024-22424), GraphQL mutations-via-GET (GitLab $3,370), framework-wide CSRF middleware disabled (Stripe Dashboard $5,000), path-traversal CSRF-token bypass (GitHub Enterprise CVE-2022-23732 $10k), Origin-omission bypass (TikTok $2,500), OAuth-state null-byte (Streamlabs), WebSocket CSRF / CSWSH (Coda), default-SameSite email-change → ATO (YoYo Games $400), social-account-link CSRF (HackerOne), JSON-CSRF via text/plain on email-change (TikTok $500). Use when hunting modern CSRF — heavy emphasis on chain-to-ATO patterns.
distributed-tracing
Implement distributed tracing with OpenTelemetry, Tempo/Jaeger — instrumentation, sampling, and trace-to-log correlation. Use when the user asks about distributed tracing, OpenTelemetry setup, span instrumentation, trace propagation, or connecting traces to logs and metrics.
grafana-dashboards
Design and maintain Grafana dashboards — service overview panels, SLO tracking, variable templates, dashboard-as-code with Grafonnet/Jsonnet.
log-aggregation
Set up Loki or ELK log aggregation for K8s workloads — structured logging, log routing, and log-based alerting.
service-mesh
Implement service mesh for mTLS, traffic management, and observability — Istio and Linkerd patterns for Kubernetes.
slo-implementation
Implement SLOs end-to-end in Prometheus — recording rules, burn rate alerts, error budget dashboards, and Sloth/pyrra integration.
attack-path-architect
Generates strategic attack trees and kill chains from reconnaissance data or domain input. Maps MITRE ATT&CK TTPs, identifies chaining opportunities, trust relationships, and prioritizes attack paths by feasibility and impact. Use when user asks for "attack path", "kill chain", "attack tree", "threat modeling from recon", "attack surface analysis", or "prioritize targets". Requires prior recon data or a domain to analyze. For authorized pentesting and red team engagements only.
grafana-foundation-sdk
Build Grafana dashboards as code with the grafana-foundation-sdk typed builders (TypeScript or Go). Use when creating, modifying, or generating Grafana dashboard JSON programmatically, converting hand-written dashboard JSON to typed code, building monitoring dashboards, or working with Prometheus/Loki queries in dashboards.
magic-mouth
Magic Mouth is trigger → message. The entire craft is specifying the trigger boundary, the message payload, and the suppression/escalation rules. It is NOT: - One-time messages: "Send a Slack message now saying X" (no trigger, no automation) - Full chatbots: "Build a conversational AI that understands context and handles open-ended questions" (requires NLU, dialogue management) - Monitoring dashboards: "Set up Grafana with real-time graphs and trend analysis" (visualization, not message routing) - Silent traps/wards: "Revert changes silently and log who tried" (defensive code manipulation, not messaging) - Encryption: "Encrypt a message so only the recipient can read it" (cryptography, not event-driven delivery) - Multimedia presentations: "Play a 20-slide deck with animations and narration" (media playback, not conditional messaging)
dashboard-design
Use when designing monitoring dashboards — visualization selection, layout principles, observability strategies (RED/USE/Golden Signals), and data storytelling.
observability-sre
Observability and SRE expert. Use when setting up monitoring, logging, tracing, defining SLOs, or managing incidents. Covers Prometheus, Grafana, OpenTelemetry, and incident response best practices.
beacon
Engineering observability and reliability through SLO/SLI design, distributed tracing, alerting, dashboards, capacity planning, toil automation, and reliability review. Use when designing observability instrumentation, defining SLOs/SLIs, building dashboards/alerts, or reviewing reliability posture.
cli-forge-infra
Ops integration assistant — reads service docs, finds the simplest config path (CLI/Helm/Operator/Terraform), builds dependency trees, proposes upgrade paths, and tracks decisions in ADRs. Use when debugging infra, integrating services, bootstrapping platforms, upgrading versions, simplifying config, or reviewing infrastructure code. Triggers on ops tool names (OpenBao, Vault, Consul, Traefik, Gitea, ArgoCD, Prometheus, Grafana, cert-manager, Istio, Linkerd, Terraform, OpenTofu, Podman, Docker, K8s, etc.) or keywords like "bootstrap", "integrate", "simplify config", "upgrade infra", "ops stack", "service mesh", "dependency tree".
logging-observability-standards
When setting up telemetry, debugging distributed systems, or standardizing application output.
azure-kubernetes
Plan, create, and configure production-ready Azure Kubernetes Service (AKS) clusters. Covers Day-0 checklist, SKU selection (Automatic vs Standard), networking options (private API server, Azure CNI Overlay, egress configuration), security, and operations (autoscaling, upgrade strategy, cost analysis). WHEN: create AKS environment, provision AKS environment, enable AKS observability, design AKS networking, choose AKS SKU, secure AKS.
devops-troubleshooter
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability. Masters log analysis, distributed tracing, Kubernetes debugging, performance optimization, and root cause analysis. Handles production outages, system reliability, and preventive monitoring. Use PROACTIVELY for debugging, incident response, or system troubleshooting.
monitoring-observability
Set up monitoring, logging, and observability for applications and infrastructure. Use when implementing health checks, metrics collection, log aggregation, or alerting systems. Handles Prometheus, Grafana, ELK Stack, Datadog, and monitoring best practices.
observability-engineer
Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows. Use PROACTIVELY for monitoring infrastructure, performance optimization, or production reliability.
nav-init
Initialize Navigator documentation structure in a project. Auto-invokes when user says "Initialize Navigator", "Set up Navigator", "Create Navigator structure", or "Bootstrap Navigator".
hunt-cloud-misconfig
Hunt cloud / infrastructure misconfigurations. AWS: public S3 buckets (s3:GetObject anonymous), permissive bucket policies (PutObjectAcl public-write), exposed CloudFront origin, public Lambda function URL, public RDS snapshot, IAM credentials in JS bundles, AWS metadata accessible via SSRF. GCP: public GCS buckets, exposed Cloud Run services, leaked service account JSON. Azure: public blob containers, exposed Function App. (Kubernetes/Docker exposure is owned by hunt-k8s; CI/CD pipeline attacks by hunt-cicd; post-credential IAM escalation by cloud-iam-deep.) Detection: targeted dorking, certificate transparency, JS bundle secret extraction, port scan for known service ports. Validate: actual data read / write / RCE. Use when hunting cloud-native storage and compute misconfig (S3/GCS/Blob, IMDS-via-SSRF, serverless, public managed services).
building-soc-metrics-and-kpi-tracking
构建 SOC 绩效指标和 KPI 跟踪仪表盘,使用 SIEM 数据衡量平均检测时间(MTTD)、 平均响应时间(MTTR)、告���质量比率、分析师生产力和检测覆盖率。适用于 SOC 领导层 需要运营可视化、持续改进跟踪或高管级安全运营效能报告的场景。
devops-automator
Expert DevOps engineer specializing in infrastructure automation, CI/CD pipeline development, and cloud operations
llm-self-loop
Restructure Web-UI / human-triggered tasks into CLI + file-output loops the LLM can iterate alone, with structured logs and addressable scratchpads. Apply trap-or-abandon: if a step cannot be looped, improve the harness rather than babysit. Trigger on iterative grunt-work, "push a button in a web UI to trigger this", monitoring dashboards, or any workflow whose inner loop requires a human in the middle.
observability-audit
Score observability across the four pillars — logs, metrics, traces, and alerts/dashboards — with per-service coverage heatmap. Cross-cutting synthesis. Static, live (Prometheus/Grafana/OTel/Datadog), and runtime (synthetic alert) modes.
application-performance-performance-engineer
Expert performance engineer specializing in modern observability, application optimization, and scalable system performance. Masters OpenTelemetry, distributed tracing, load testing, multi-tier caching, Core Web Vitals, and performance monitoring. Handles end-to-end optimization, real user monitoring, and scalability patterns. Use PROACTIVELY for performance optimization, observability, or scalability challenges.
monitoring-expert
Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.
monitor-scaffold
Drop in supervisor config + /healthz endpoint + restart runbook for each service in profile.monitors.targets, per supervisor (systemd / pm2 / k8s / docker-compose)
promql-cli
CLI for querying Prometheus and PromQL-compatible engines (Thanos, Cortex, VictoriaMetrics, Grafana Mimir, Grafana Tempo...) — instant queries, range queries, metric discovery (metrics/labels/meta subcommands), output formats (table/csv/json/graph). Apply when executing PromQL queries, troubleshooting performance issues on a software having observability, investigating latency/error rates/saturation, or analyzing time series data.
grafana-dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
prometheus-configuration
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
openstack-monitoring
OpenStack monitoring operations skill for deploying, configuring, and operating the cloud health monitoring stack. Covers Prometheus metric collection and scrape targets, Grafana dashboard provisioning and visualization, Alertmanager notification channels and routing, alerting rules for service health and resource exhaustion, service endpoint health checks, log aggregation strategies, SLA tracking with availability and response time percentiles, and capacity trend analysis from historical metrics. Use when deploying monitoring via Kolla-Ansible, configuring alert thresholds, troubleshooting blank dashboards, tuning noisy alerts, or analyzing cloud performance trends.
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
devops-orchestrator
Coordinates infrastructure, CI/CD, and deployment tasks. Use when provisioning infrastructure, setting up pipelines, configuring monitoring, or managing deployments. Applies devops-standard.md with DORA metrics.
grafana-dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
prometheus-configuration
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
service-mesh-observability
Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.
ccc-devops
complete DevOps ecosystem — 21 skills in one. Deployments, CI/CD, containers, AWS, monitoring, security, IaC, networking, and runbooks.
k8s-components-checker
Survey an RKE2 community cluster against an embedded compatibility registry of 19 stack components and produce a verdict for upgrade-readiness, drift-review, and version-skew questions. Components: RKE2, Rancher, Harvester, Cilium, Tetragon, cert-manager, Kyverno, KEDA, Argo CD, Harbor, Traefik, Rook, Ceph, OpenEBS, GitLab, ECK, Zalando postgres-operator, Grafana Mimir, NVIDIA GPU Operator. Works air-gapped — compatibility data lives in `references/compat/`. Surveys run via `kubectl` + `helm` + `pluto` + the apiserver `apiserver_requested_deprecated_apis` metric from the operator's workstation. Community editions only — Prime/EE-gated content is ignored. NOT for installing components, NOT for executing upgrades, NOT for tracking per-cluster running state (the registry is methodology, not inventory).
prometheus-mimir-grafana
Query Prometheus and Grafana Mimir, write and debug PromQL, and build or fix Grafana dashboards — for agents solving problems from metrics. Covers the Prometheus HTTP API (`/api/v1/query`, `query_range`, `series`, `labels`, `metadata`), Mimir multi-tenancy (`X-Scope-OrgID`, federation `a|b|c`, per-tenant 422/429 limits), the PromQL surface (selectors, rate family, classic + native histograms, `histogram_quantile`, vector matching `on()`/`group_left`, recording rules), Grafana dashboard JSON (panels, targets, variables + interpolation specifiers, legacy `/api/dashboards/db` vs Grafana-12 `/apis/dashboard.grafana.app/v1beta1/…`), KPI frameworks (RED, USE, Golden Signals, SLO burn-rate), connection recipes, MCP servers vs curl, and the PromQL trap list.
vllm-observability
Observe production vLLM — `/metrics` Prometheus surface (V1 engine), SLO-driven alerting on TTFT/ITL/queue/KV/preemption/aborts/corrupted-logits, shipping Grafana dashboards in `examples/observability/`, OTLP tracing with `--otlp-traces-endpoint` and `--collect-detailed-traces={model,worker,all}`, diagnostic rules to triage from /metrics alone — queue-grows + TPOT-stable means capacity, queue-stable + TPOT-grows means context/model, DCGM `SM_OCCUPANCY` is the real GPU-saturation signal not `GPU_UTIL`. V1 metric names (kv_cache_usage_perc), gpu_→kv_ rename saga, DCGM-exporter pairing, dashboard-lying pitfalls.
grafana-platform-dashboard
Validate OpenShift Grafana dashboards.
backend-developer
Backend Developer (/be, alias: James, /james) - Senior Backend Developer with 10+ years experience. Covers Java/Spring Boot (default), Kotlin, Python/FastAPI, PHP/Laravel, Quarkus, and Kafka/messaging - detects the project's stack and loads the matching reference. Use when implementing server features, REST APIs, business logic, persistence, messaging, or unit/integration tests in any of these stacks.
sre-engineer
SRE / Observability Engineer (/sre) — reliability engineering: SLOs/SLIs & error budgets, monitoring & alerting (Prometheus, Grafana, OpenTelemetry), incident response & runbooks, on-call, capacity & load, chaos/resilience, and post-incident reviews. Use when defining reliability targets, instrumenting observability, setting up alerting, writing runbooks, doing incident response, or reviewing a change for production readiness. Invoke alongside /arch for reliability NFRs and devops-engineer for the underlying infra/CI-CD. NOT for provisioning infra or pipelines (that's devops-engineer) — /sre owns reliability, not the cluster.
docker
Manage the workflow engine's Docker Compose stack. Use when starting, stopping, rebuilding containers, or resetting the database.
golang-observability
Golang everyday observability — the always-on signals in production. Covers structured logging with slog, Prometheus metrics, OpenTelemetry distributed tracing, continuous profiling with pprof/Pyroscope, server-side RUM event tracking, alerting, and Grafana dashboards. Apply when instrumenting Go services for production monitoring, setting up metrics or alerting, adding OpenTelemetry tracing, correlating logs with traces, migrating legacy loggers (zap/logrus/zerolog) to slog, adding observability to new features, or implementing GDPR/CCPA-compliant tracking with Customer Data Platforms (CDP). Not for temporary deep-dive performance investigation (→ See golang-benchmark and golang-performance skills).
kookr-oss-contribution-gate
Rate limiting and blocked-repo enforcement for OSS contributions — hook behavior, oss-gate CLI, ledger format, configuration
pr-contribution-excellence
Patterns for excellent open-source PR contributions, distilled from analyzing real PRs across repositories
devops
DevOps practices, CI/CD, and infrastructure management
monitoring-expert
Use when setting up monitoring systems, logging, metrics, tracing, or alerting. Invoke for dashboards, Prometheus/Grafana, load testing, profiling, capacity planning.
dashboard-designer
Use this skill when designing a data dashboard—choosing KPIs, structuring layout, applying visual hierarchy, or deciding which BI tool to use. Trigger phrases: 'design a dashboard', 'build a KPI dashboard', 'what should my dashboard show', 'help me layout a dashboard', 'dashboard for monitoring'. Not for building chart code from scratch (use chart-builder), writing SQL queries (use sql-analyst), or designing marketing landing pages.
monitor-setup
Set up error tracking, alerting, and health checks. Use AFTER deploy to ensure observability. Step 7 of 7-step workflow. Maps to H7 (Sharpen the Saw).
targeted-debug
Focused debug of a specific production issue — read only files named in the stack trace, error message, or user input; form a hypothesis from observable evidence; do NOT explore the codebase broadly. Use when the user wants to understand a specific bug or error WITHOUT spinning up a full /investigate pipeline.
evan-insight-blog-writer
evan-insight 블로그 투자 분석 글 작성. 어그로 두괄식, 쉬운 언어, 자연스러운 문체. 투자 분석, 주식 분석, 기업 분석, 블로그 글쓰기, 투자 글, evan-insight, 100배 주식, Next 구글, Next NVIDIA 관련 키워드로 트리거.
hanun-observability-craft
How Hanun wires hardening overlays, secrets ops, and observability (Prometheus / Grafana / Loki / Sentry / GlitchTip / OpenTelemetry) — the always-reapply-on-recreate rule, the metric / log / trace separation, the alerting discipline, and the no-prod-execution boundary. Invoke when observability wiring or hardening setup is in scope.
observability
This skill should be used when the user asks about "observability" or "monitoring", what "metrics, logs, and traces" to collect, "health checks" (liveness/readiness), "alerting" or "on-call", "SLO/SLI" or "error budgets", the "RED" or "USE" method, "dashboards", or names a tool like "Prometheus", "Grafana", or "Datadog". Use it whenever a design has no answer to "how would we know this is broken?" or "what do we alert on?" — i.e. any time failure would be invisible until users complain, even if the user doesn't say "observability".
reject-job
This skill should be used when the user wants to reject, hide, or filter out a remote job from future email digests. Triggers on phrases like "reject this job", "hide [company]", "add to reject list", "don't show [company] again", "remove [company] from results", or when reviewing remote job emails and marking jobs as not relevant.
service-mesh-observability
Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.
implementing-observability
Monitoring, logging, and tracing implementation using OpenTelemetry as the unified standard. Use when building production systems requiring visibility into performance, errors, and behavior. Covers OpenTelemetry (metrics, logs, traces), Prometheus, Grafana, Loki, Jaeger, Tempo, structured logging (structlog, tracing, slog, pino), and alerting.
couchbase-observability
Monitor, alert on, and observe Couchbase clusters in production. Use whenever the user asks about Couchbase metrics, Prometheus, Grafana, alerting, alert thresholds, memory high watermark, disk usage, replication lag, query latency, index build progress, DCP lag, ops/sec, cache miss ratio, Couchbase Exporter, admin_stats_* tools, log aggregation, SIEM shipping, health checks, or 'how do I know if my Couchbase cluster is healthy.' Distinct from couchbase-mcp (calling the tools) and couchbase-security-hardening (audit log shipping). Use proactively for new production deployments needing an observability stack, incident response setup, and SLO definition.
observability-and-growth
Full instrumentation from day one. PostHog consolidates product analytics + feature flags + error tracking (one platform, one bill). GA4 via GTM (14-step automation, custom dimensions over events, server-side tagging). Sentry (deep error tracking + performance). Stripe (webhook-first with idempotent processing). Listmonk on Coolify (newsletters via Resend SMTP relay). PLG 7-layer framework. Programmatic SEO (5 page types). Incident auto-remediation via Sentry→Inngest pipeline. AI search (GEO) awareness. Local business conversions (phone_click, direction_click, form_submit, booking_click) with CRO patterns for both SaaS and local.
grafana-architect
Grafana dashboards + alerts — dashboards-as-code (Grizzly), per-service folders, one-question-per-panel, unified alerting with runbooks, low-cardinality discipline. Use when designing dashboards, writing alert rules, or auditing.
observability-architect
Application-side observability — structured logs, Prometheus metrics, OTel traces, signal correlation, head sampling, PII discipline, RED+USE. Use when instrumenting code, naming metrics, or auditing what a service emits.
monitoring
监控与告警
monitoring-observability
Provides monitoring and observability best practices covering the three pillars (logs, metrics, traces), OpenTelemetry instrumentation, Prometheus/Grafana dashboards, SLO-based alerting, and APM strategies. Use when setting up monitoring, observability, prometheus, grafana, opentelemetry, alerting, tracing, logging, metrics, dashboards, SLOs, or APM.
grafana-alert-router
Routes Grafana alerting webhook payloads to Slack, PagerDuty, and OpsGenie channels based on label matching rules. Supports alert grouping and silence management via the Grafana Alerting API.
obs-bootstrap
Step-by-step OpenTelemetry and uFawkesObs setup: SDK init patterns for TypeScript, Python, Go; DORA metric spans; Grafana dashboard spec. Use when adding observability to a service.
pipeline-bootstrap
Step-by-step guide to connect a uFawkesAI project to uFawkesPipe and fawkes platform: Dockerfile, ArgoCD manifest, DORA deployment spans. Use when setting up CI/CD for a new service.
monitoring-patterns
Application monitoring patterns covering Prometheus metrics (Counter, Gauge, Histogram, Summary), the prometheus-client Python library, metric naming conventions, labels, and health check endpoints. Use whenever a Python project instruments metrics, uses prometheus-client, or the user asks about Prometheus, metrics, monitoring, health checks, or observability, even if "Prometheus" is not mentioned by name.
mcp-agent-manager
Route user runtime requests to a scoped MCP and call the best matching tool. Also manages MCP setup, health checks, Teleport sync, and agent rendering.
obs-guardian
Builds observability, monitoring, alerting, and incident visibility for production systems. Covers OpenTelemetry instrumentation for traces, metrics, and logs; structured logging with JSON, correlation IDs, and sampling; Prometheus and Grafana scrape configs, dashboards, and recording rules; distributed tracing with Jaeger and Tempo; SLO/SLA definition, error budgets, burn-rate alerts; PagerDuty and OpsGenie alerting rules; and on-call runbook templates. Use this skill when the user says "set up monitoring," "instrument with OpenTelemetry," "add structured logging," "set up Grafana dashboards," "define SLOs," "no visibility into my app," "tracing across microservices," "alerting rules," or "production incident with no logs."
devops-best-practices
Opinionated production-grade DevOps defaults for Terraform, Kubernetes, CI/CD, Docker, cloud security, observability, cost, and disaster recovery. ALWAYS use when generating, reviewing, or modifying any infrastructure code, Kubernetes manifests (Deployment, Service, StatefulSet, Helm, Kustomize), Terraform (.tf, modules, state), Dockerfiles, docker-compose, CI/CD pipelines (.github/workflows, .gitlab-ci.yml, Jenkinsfile), cloud resources (AWS/GCP/Azure), IAM policies, security groups, observability setup (Prometheus, Grafana, OpenTelemetry), or DNS/TLS/CDN config — even if the user does not explicitly ask for best practices. Prevents the failure modes that hurt production teams most often: missing PDBs, single replicas in prod, latest image tags, public S3 buckets, long-lived credentials, missing observability, and CI/CD supply-chain risks. Apply opinionated defaults by default; surface tradeoffs when the user has reason to deviate.
office-docs
Generate PPTX presentations, DOCX reports, and XLSX spreadsheets from structured data — using python-pptx, python-docx, and openpyxl without requiring Microsoft Office
observability
Backend observability patterns — structured logging, Micrometer metrics, OpenTelemetry tracing, Spring Boot Actuator, Kubernetes health probes, alerting, and dashboards. Use when user mentions logging, metrics, tracing, monitoring, health checks, or Prometheus.
aio-grafana-diagram
Create Grafana diagrams for system visualization — analyzes codebase to auto-generate Mermaid diagrams with metric binding. For standalone Mermaid diagrams use aio-mermaid instead.
spring-microservices-architect
Production-grade governance agent for Spring Boot microservices. Scaffolds projects iteratively using capability-based layering, enforces coding standards, and validates against battle-tested reference patterns. Fully portable — works with any domain. USE FOR: microservice, Spring Boot, scaffold, Docker compose, kubernetes, helm, eureka, gateway, resilience4j, reactive, spring cloud, openapi, persistence, security, oauth, tracing, zipkin, monitoring, prometheus, grafana, native compilation, graalvm, code review, architecture review, quality gate, governance, spring cloud stream, rabbitmq, kafka, testcontainers, mapstruct, service discovery, edge server, config server, circuit breaker, distributed tracing, entity, entities, domain model, generate entity, persistence model, create entity, MongoDB document, JPA entity, MapStruct mapper, repository, test, verify, validate, TDD, test-driven, failing test, integration test, build check, regression test, quality check, security database, MFA, multi-factor, WebAuthn,
managing-context
Discovers and loads relevant project context from markdown documentation before each task. Matches context documents based on keywords, file paths, and task types. Use at task start to access project plans, architecture, and implementation status.
Integration detected automatically from skill content. Some results may be false positives.