air-gapped
OrganizationClaude Code plugin marketplace — 40+ installable reference skills across vLLM/SGLang inference, Kubernetes & Harvester, GPU host bring-up, observability, security, and agent workflows.
Categories
Indexed Skills (51)
autoresearch
Karpathy-pattern autoresearch — autonomous hill-climbing over a measurable metric, deep multi-agent research, or research-then-optimize. Three modes: Optimize (keep/discard ratchet), Research (STORM multi-perspective), Improve.
aiperf
NVIDIA AIPerf — vendor-neutral generative-AI inference benchmarking (genai-perf successor). Covers `aiperf profile` with concurrency / request-rate / fixed-schedule trace replay / user-centric / multi-run confidence, 15 endpoint types (chat, completions, embeddings, rankings, responses, image-gen, video-gen, NIM, HF-TGI, template, etc.), 6 custom dataset formats (single_turn, multi_turn, mooncake_trace, bailian_trace, burst_gpt_trace, random_pool), 40+ public datasets, goodput SLOs, GPU + Prometheus telemetry, plot/analyze-trace/synthesize/service subcommands, plugin extensibility, and reasoning-token TTFT/TTFO split.
ansible-idrac-9-10
Run and debug `dellemc.openmanage` Ansible playbooks against Dell PowerEdge **iDRAC 9** (14G–16G) and **iDRAC 10** (17G — R670, R770, R870, R970, XE9780, XE9785). Covers the iDRAC 10 / iDRAC 9 ≥ 7.30.10.50 `BasicAuthState: Unadvertised` default that silently 401s `ansible.builtin.uri` (Dell KB 000437501), the `idrac_session` + `x_auth_token` lifecycle with `block:/always:`, `force_basic_auth: true` fallback for raw Redfish, OMSDK modules (`idrac_firmware`, `idrac_server_config_profile`) that cannot use tokens, iDRAC 10 attribute deltas (`iDRAC.IPv4Static.*` → `iDRAC.IPv4.Static*`, `iDRAC.NIC.*` → `iDRAC.Network.*`, ACME+SCEP → `iDRAC.ACE`, `BIOS.SysSecurity.AcPwrRcvry*` → `System.ServerPwr.*`), iDRAC 9-only modules (`idrac_network` → `idrac_network_attributes`, `idrac_syslog`, `idrac_timezone_ntp`), iDRAC 10 Redfish Jobs URI under `/Oem/Dell/Jobs/`, WS-MAN removed on 17G, and version pins (collection ≥9.12.3 broad / ≥10.0.2 full; 9.12.1 for iDRAC 8).
argo-cd-apps
Author and maintain Argo CD `Application` and `ApplicationSet` manifests as a GitOps consumer (publisher), targeting Argo CD v3.3 / v3.4 (May 2026). Covers source types (Helm, Kustomize, OCI, multi-source, plugin), sync policies + options + waves + hooks, ApplicationSet generators (List, Cluster, Git, Matrix, Merge, SCMProvider, PullRequest, Plugin, ClusterDecisionResource), Progressive Sync (Beta), Source Hydrator (still Alpha), AppProjects, RBAC, sync impersonation (`destinationServiceAccounts`), GPG/cosign signature verification, GitOps repo layout (mono vs poly, app-of-apps vs ApplicationSet — Argo recommends ApplicationSet first), troubleshooting drift / OutOfSync / sync loops / stuck-deletion / hook failures, and v3.0→v3.4 changes (annotation tracking default, SSA-migration regression, CVE-2026-42880 Secret leak). NOT for installing or operating the Argo CD control plane (HA, Dex, repo-server tuning, UI customization).
jinja-expert
Author, read, and debug Jinja2 templates across the three places Jinja lives in 2026 — HuggingFace `chat_template.jinja` (rendered by `apply_chat_template` for vLLM / sglang), Ansible playbooks + `.j2` files, and Jinja-adjacent Kubernetes workflows (`values.yaml.j2`, `kubernetes.core.k8s + template`, Helm post-renderers). Companion to the `helm` skill — Helm charts are Go `text/template` + Sprig, not Jinja, and this skill makes that disambiguation explicit.
k8s-components-checker
Survey an RKE2 community cluster against an embedded compatibility registry of 19 stack components and produce a verdict for upgrade-readiness, drift-review, and version-skew questions. Components: RKE2, Rancher, Harvester, Cilium, Tetragon, cert-manager, Kyverno, KEDA, Argo CD, Harbor, Traefik, Rook, Ceph, OpenEBS, GitLab, ECK, Zalando postgres-operator, Grafana Mimir, NVIDIA GPU Operator. Works air-gapped — compatibility data lives in `references/compat/`. Surveys run via `kubectl` + `helm` + `pluto` + the apiserver `apiserver_requested_deprecated_apis` metric from the operator's workstation. Community editions only — Prime/EE-gated content is ignored. NOT for installing components, NOT for executing upgrades, NOT for tracking per-cluster running state (the registry is methodology, not inventory).
keda
Configure, operate, and master KEDA (Kubernetes Event-driven Autoscaling) — ScaledObject, ScaledJob, TriggerAuthentication CRDs, 70+ scalers, HPA behavior tuning, scale-to-zero, the KEDA HTTP Add-on, production hardening, multi-trigger semantics, scalingModifiers formulas, GitOps integration, and troubleshooting stuck scalers. Covers the common traps (cooldownPeriod only applies to N→0, CPU/memory cannot drive scale-to-zero alone, activationThreshold vs threshold, multi-trigger max-of semantics, HPA conflicts).
keycloak-iam
Operate, configure, deploy, secure, and integrate with Keycloak (open-source IAM) — the modern Quarkus distribution (24.x–26.6.x), the Keycloak Operator with `Keycloak` and `KeycloakRealmImport` CRDs, and realm/client/identity-provider configuration.
lmcache-mp
LMCache multiprocess (MP) mode — standalone LMCache server in its own pod/process that vLLM connects to over ZMQ. Gives process isolation, no GIL contention on the inference path, one cache shared by multiple vLLM pods per node, and CPU-memory scaling independent of GPU memory. Covers the `LMCacheMPConnector` path (vs the in-process `LMCacheConnectorV1`), the DaemonSet+Deployment K8s pattern and LMCache Operator, the L1 (CPU DRAM) + L2 (NIXL, fs, mooncake_store, s3, Redis) cascade, the `lmcache/standalone` + `lmcache/vllm-openai` image pair, and the production gotchas (`--no-enable-prefix-caching`, `--disable-hybrid-kv-cache-manager`, vLLM/lmcache version pins, hybrid models unsupported, cache_salt fallback bug).
makefile-best-practices
Makefile best practices, patterns, and templates for GNU Make 4.x — dependency graphs, task-runner workflows, parallel-safe recipes, self-documenting help targets, and language-specific patterns (Go, Python, Node, Docker, Helm, POSIX).
nvidia-nixl
NVIDIA Inference Xfer Library (NIXL) operator + developer reference. Point-to-point KV-cache and tensor transport for distributed inference (Dynamo, vLLM, SGLang). Covers the agent API (full Python reference; C++/Rust via upstream pointers), all 13 backend plugins (UCX, GDS, GDS_MT, libfabric, mooncake, posix, hf3fs, obj/S3, azure_blob, gusli, uccl, gpunetio/DOCA, telemetry), build paths (pip nixl-cu12/cu13, meson+ninja from source), ETCD vs side-channel metadata, telemetry (Prometheus + cyclic shared-memory), NIXL-EP elastic MoE device kernels, and Dynamo / vLLM NixlConnector / SGLang integration patterns.
open-webui-embeddings
Wire HuggingFace embedding + reranker models (BGE-M3, BGE-Reranker-v2-m3, etc.) into Open WebUI's RAG pipeline via LiteLLM proxying HuggingFace Text Embeddings Inference (TEI). Covers the exact wire shapes Open WebUI sends (URL auto-append on embed but NOT rerank; payload + response shapes for both modes), the LiteLLM-TEI gotchas (encoding_format=null trap, HF-driver task_type misdetection, openai vs huggingface driver tradeoffs), TEI config cliffs (max-client-batch-size 422 under hybrid search, max-batch-tokens AS the auto-truncate boundary, arch-specific Docker images), and the end-to-end production config. BGE-M3 + BGE-Reranker-v2-m3 are worked examples; patterns generalise to any TEI encoder.
open-webui-valkey-websocket
Deploy Open WebUI multi-pod with WebSockets and Valkey/Redis Sentinel at 1000+ user scale on Kubernetes. Centerpiece is the structural Socket.IO+Redis frame-amplification bug (#23733) that cripples multi-pod streaming, and the maintainer-endorsed mitigation (`CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE`). Covers all multi-pod env vars, the custom-model-icon perf history (base64-in-/api/models, fixed late 2025–Apr 2026), the official helm chart's gaps (bundled Redis is unsuitable for production; no HPA/PDB/probes/sticky sessions), and the catalog of known multi-pod issues with current status.
prometheus-mimir-grafana
Query Prometheus and Grafana Mimir, write and debug PromQL, and build or fix Grafana dashboards — for agents solving problems from metrics. Covers the Prometheus HTTP API (`/api/v1/query`, `query_range`, `series`, `labels`, `metadata`), Mimir multi-tenancy (`X-Scope-OrgID`, federation `a|b|c`, per-tenant 422/429 limits), the PromQL surface (selectors, rate family, classic + native histograms, `histogram_quantile`, vector matching `on()`/`group_left`, recording rules), Grafana dashboard JSON (panels, targets, variables + interpolation specifiers, legacy `/api/dashboards/db` vs Grafana-12 `/apis/dashboard.grafana.app/v1beta1/…`), KPI frameworks (RED, USE, Golden Signals, SLO burn-rate), connection recipes, MCP servers vs curl, and the PromQL trap list.
sglang-hicache
SGLang HiCache (hierarchical KV cache) — three-tier prefix cache: GPU HBM (L1) → pinned host DRAM (L2) → distributed L3 (Mooncake / 3FS / NIXL / AIBrix / EIC / SiMM / file / LMCache). Covers `--enable-hierarchical-cache`, all `--hicache-*` flags, write policies, page_first* layouts, prefetch policy (best_effort / wait_complete / timeout), per-rank sizing, MHA / MLA / DSA / Mamba / SWA support matrix (SWA + 3FS hybrid shipped in v0.5.11), runtime attach/detach HTTP admin, and auto-rewrite startup log lines that silently substitute layout × IO × storage combinations.
sglang-model-gateway
SGLang Model Gateway (`sgl-model-gateway`, formerly `sgl-router`) — Rust router fronting vLLM and SGLang inference workers on Kubernetes. Covers first-class vLLM gRPC backend plus HTTP transparent-proxy for vanilla vLLM, the policy set (six `--policy` values, `cache_aware` default), tokenizer-format dispatch (`tokenizer.json` HF-fast vs `tiktoken.model` BPE — including when neither is required because `cache_aware` is text-based), air-gapped recipe (gateway ignores `HF_ENDPOINT`, mount tokenizer files on PVC only when actually needed), K8s manifests with `model_id` labels and per-model RBAC, three HA mitigations (single + PDB, `sessionAffinity: ClientIP`, `--enable-mesh` CRDT sync), and a pitfall catalog covering the Dec 2025 `sgl-router` → `sgl-model-gateway` rename and over-engineered tokenizer init-container traps.
skill-improver
Autoresearch loop for Claude Code skills — greedy keep/discard hill climbing on a 10-dimension quality rubric, with blind subagent validation for self-scoring bias, plus a `freshen` mode that probes external references (release notes, docs, deprecation signals) and applies verified updates, plus a `trigger` mode that measures and tunes the skill's frontmatter description until it reliably fires when it should and stays silent when it shouldn't (60/40 train/test split, 3 runs/query, blinded test scores).
vllm-benchmarking
Run production vLLM benchmarks — `vllm bench` (serve, throughput, latency, sweep, startup, mm-processor), request-rate vs max-concurrency semantics, TTFT/TPOT/ITL/E2EL percentiles, goodput SLO measurement, prefix-cache workloads, air-gapped operation (HF_ENDPOINT, ModelScope, hf-mirror, offline cache). Methodology split — SLO health checks vs A/B change sweeps — plus pitfalls that produce misleading numbers (no warmup, wrong tokenizer, random-as-prod, `--request-rate inf` alone).
vllm-caching
vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA), MultiConnector composition. Version gates, sizing math (flag total across TP, not per-GPU — opposite of SGLang), KV-vs-weights offload distinction operators most often get wrong.
vllm-chat-templates
vLLM chat-template (prompt-side Jinja) operator reference. Template resolution precedence (`--chat-template` → AutoProcessor → tokenizer default → bundled fallback), `chat_template_kwargs` allowlist silently dropping `add_generation_prompt`/`enable_thinking`/custom kwargs (PR 27622 fix), 27 shipped `tool_chat_template_*.jinja` files, known template-layer bugs for Qwen3/Qwen3-Coder, DeepSeek-R1/V3/V3.1/V3.2, GPT-OSS, Kimi-K2, Llama-4, Mistral (HF vs mistral mode), Gemma-3/4, Phi-4, GLM. Prompt side only — output parsing lives in sibling skills.
vllm-configuration
Configure vLLM completely — YAML config file format, CLI arg precedence, full VLLM_*/HF_*/TRANSFORMERS_* env-var catalog, end-to-end recipe for air-gapped environments (internal HF mirrors, hf-mirror.com, ModelScope, HF_HUB_OFFLINE with pre-seeded cache, gated models offline, trust_remote_code supply-chain implications). VLLM_HOST_IP vs API-host confusion, Kubernetes-service-named-`vllm` env-var poisoning, usage-stats triple opt-out, YAML precedence surprises.
vllm-gemma-4-31b
Operating-point reference for serving Gemma 4 31B on vLLM — TP sizing, max_model_len, max_num_seqs, gpu_memory_utilization, kv_cache_dtype, EAGLE3 spec-dec, chat_template choice.
vllm-input-modalities
vLLM non-chat inference surfaces — text embeddings (`/v1/embeddings`, `/v2/embed`), reranking/scoring (`/rerank`, `/score`), speech-to-text (`/v1/audio/transcriptions`, `/v1/audio/translations`), document OCR via VLMs. Covers 2026 `--runner pooling` (replacing `--task embed`), v0.20 deprecations (`score`→`classify`, multitask pooling, `encode`→`token_embed`+`token_classify`), Matryoshka/MRL, ColBERT/ColPali/ColQwen late-interaction MaxSim, Cohere v2 `/v2/embed`, Jina v3/v4/v5 quirks, cross-encoder score templates, Whisper large-v3-turbo quants, DeepSeek-OCR recipe (NGramPerReqLogitsProcessor, no prefix cache, GUNDAM mode).
vllm-nvidia-hardware
NVIDIA AI-hardware + vLLM-platform reference covering Hopper (H100/H200), Blackwell (B100/B200/B300) and Blackwell Ultra, Grace-Blackwell superchips and NVL72 racks (GB200, GB300), Vera Rubin (R100/R300) with VR200 NVL144 and Kyber NVL576, Dell PowerEdge XE family and IR5000/IR7000/IR9048 racks. Per-SKU HBM, FP4/FP8/FP16 TFLOPs, NVLink5, TDP, rack power/cooling (135 kW GB300, 180-220 kW NVL144, 600 kW Kyber), DLC vs RDHx, 800 VDC HVDC. Memory-wall roofline, HBM3E→HBM4 supply 2026. vLLM attention-backend × SM matrix, FP4/FP8 paths, KV connectors, Blackwell gotchas (SM103 TRTLLM hang, 270 vs 288 GB B300 bin split).
vllm-observability
Observe production vLLM — `/metrics` Prometheus surface (V1 engine), SLO-driven alerting on TTFT/ITL/queue/KV/preemption/aborts/corrupted-logits, shipping Grafana dashboards in `examples/observability/`, OTLP tracing with `--otlp-traces-endpoint` and `--collect-detailed-traces={model,worker,all}`, diagnostic rules to triage from /metrics alone — queue-grows + TPOT-stable means capacity, queue-stable + TPOT-grows means context/model, DCGM `SM_OCCUPANCY` is the real GPU-saturation signal not `GPU_UTIL`. V1 metric names (kv_cache_usage_perc), gpu_→kv_ rename saga, DCGM-exporter pairing, dashboard-lying pitfalls.
vllm-omni
vLLM-Omni output-side multimodal generation — image (FLUX.1/2, Qwen-Image, GLM-Image, BAGEL, SD3.5, HunyuanImage-3.0), video (Wan2.1/2.2, LTX-2, HunyuanVideo-1.5), TTS (Qwen3-TTS, CosyVoice3, Voxtral-TTS), any-to-any omni (Qwen3-Omni, Qwen2.5-Omni, MiMo-Audio) via `vllm serve --omni`. Stage-based disaggregation (OmniConnector + Mooncake + RDMA), `/v1/images/generations`, async+sync `/v1/videos`, `/v1/audio/speech` with voice-upload, PCM16 WebSocket `/v1/realtime`, Ulysses/Ring SP + CFG-parallel, DiT FP8/INT8/GGUF, CUDA/ROCm/NPU/XPU/MUSA matrix, release pitfalls (v0.19.0rc1 FLUX regression, GLM-Image transformers>=5.0, Qwen3-TTS enforce-eager).
vllm-performance-tuning
vLLM performance-tuning operator reference — tuning workflow (baseline → bottleneck → knob → re-bench), fused-MoE kernel autotune (`benchmark_moe.py` generates `E=N,N=M,device_name=X.json` configs), DeepEP all-to-all + expert parallelism + EPLB, CUDA graph modes (FULL_AND_PIECEWISE default), torch.compile AOT + compile cache, scheduler knobs (`--max-num-batched-tokens`, `--max-num-seqs`, `--async-scheduling`), TP/EP/DP/PP decision tree, NCCL/DCGM on H100/H200/B200/GB200, PD disaggregation (Nixl/Mooncake/LMCache), known regressions + vendor quirks (v0.14→0.15.1 MiniMax, MI300X FP8<BF16, DeepGEMM M<128 TTFT).
vllm-quantization
vLLM datacenter-GPU quantization — picking, configuring, troubleshooting NVFP4, FP8, MXFP4, MXFP8, AWQ, GPTQ, INT8, compressed-tensors, modelopt, quark on H100/H200/B200/B300/GB200/GB300. 29 `--quantization` flag values, KV-cache dtypes (fp8_e4m3, nvfp4, per-token-head, turboquant), MoE backend selection (CUTLASS, TRTLLM, FlashInfer, DeepGEMM, Marlin, Qutlass), producing checkpoints with llm-compressor and NVIDIA ModelOpt (NVFP4_DEFAULT_CFG, FP8_DEFAULT_CFG, W4A16, SmoothQuant+GPTQ), online quantization (`fp8_per_tensor`, `fp8_per_block`), training EAGLE-3/dflash drafters on BF16 targets before PTQ, version gates per vLLM release (v0.14 → v0.21).
vllm-reasoning-parsers
vLLM reasoning-parser operator + developer reference. `--reasoning-parser` CLI wiring, `ReasoningParser` contract (non-streaming `extract_reasoning` + per-delta `extract_reasoning_streaming`), `is_reasoning_end` xgrammar gating, `--structured-outputs-config.enable_in_reasoning` bypass, 25 built-in parsers with per-model quirks, 15 production pitfalls, authoring custom parsers via `@ReasoningParserManager.register_module` or plugin.
vllm-speculative-decoding
Pick, configure, tune, monitor vLLM speculative decoding in production. Eleven SpeculativeMethod options (ngram, ngram_gpu, medusa, mlp_speculator, draft_model, suffix, eagle, eagle3, dflash, mtp, extract_hidden_states), `--speculative-config` JSON schema, which methods pair with which target model family, Prometheus acceptance metric surface, version gates (v0.11.1 EAGLE-3 preamble fix, v0.16 parallel drafting, v0.18 ngram_gpu, v0.19 dflash and zero-bubble), composability with chunked prefill / PP / LoRA / FP8 / structured outputs, Arctic Inference plugin, where spec-dec stops paying at high batch.
vllm-tool-parsers
vLLM tool-calling operator reference — picking `--tool-call-parser` per model family, writing custom parsers via `--tool-parser-plugin`, navigating vLLM source + GitHub tracker to debug any specific tool-call question. Pointer map, not source paraphrase. All 40+ built-in parsers (JSON-sentinel, pythonic, XML, harmony grammars), CLI contract, `prev_tool_call_arr`/`streamed_args_for_tool` flush invariants, diagnostic playbook (isolate template-vs-parser via raw `/v1/completions` + unicodedata codepoint decode).
helm
This skill should be used when authoring or maintaining Helm charts — creating charts, writing templates and _helpers.tpl, values.yaml patterns, Chart.yaml, values.schema.json, helm-docs, and library charts. Covers Helm 4 (SSA, WASM, OCI digest), chart CI/CD, OpenShift compatibility, chart security, CRD management, and production templates. NOT for installing or consuming third-party charts.
rancher-upgrade
Plan and sequence COMMUNITY-edition Rancher upgrades across air-gapped multi-cluster fleets — a management/"hosting" Rancher cluster plus the downstream RKE2/K3s clusters it provisions. Covers the community release model (2.11→2.14, community-vs-Prime cadence, EOL), the Kontainer Driver Metadata (KDM) downstream- Kubernetes support matrix that decides which downstream k8s minors each Rancher version can manage (and the stranding risk when a host-Rancher bump outruns its sub-clusters), cross-cluster upgrade ordering, the embedded-CAPI→Rancher-Turtles migration, Fleet coupling, cert-manager/Helm/backup prerequisites, backup- restore-operator + etcd rollback, and the air-gapped upgrade procedure (which images/charts/KDM to mirror). Community editions only; Prime-gated content is flagged and excluded. Companion to k8s-components-checker, which owns the management-cluster k8s compatibility verdict; this skill owns the upgrade methodology and downstream coordination.
patch
Generate candidate fixes for verified security findings. Consumes TRIAGE.json (preferred), VULN-FINDINGS.json, or an execution-harness results directory. Static-analysis input gets a per-finding patch subagent + an independent reviewer and is written as inert diffs for human review; results-directory input from an external execution harness (the defending-code reference pipeline, if installed) is delegated to its verified build→reproduce→regress→re-attack patch ladder. Writes PATCHES/bug_NN/{patch.diff,patch_result.json}, PATCHES.md, and PATCHES.json. Use when asked to "fix the findings", "patch these vulns", "generate fixes", or "close the loop on triage".
threat-model
Build a threat model for a target codebase. Three modes: "interview" walks an application owner through the four-question framework and produces a threat model from their answers; "bootstrap" derives a threat model from the code plus past vulnerabilities (CVEs, git history, pentest reports) when no owner is available; "bootstrap-then-interview" chains the two when both owner and codebase are present. All write THREAT_MODEL.md in a shared schema. Use when asked to "threat model", "build a threat model", "map the attack surface", or "what should we be worried about in this codebase".
triage
Triage a batch of raw security findings. Verify each is real, collapse duplicates, re-rank by derived exploitability, and tag with an owner. Takes a directory or file of scanner output and writes TRIAGE.json + TRIAGE.md sorted by what actually needs engineering attention. Use when asked to "triage findings", "validate scanner output", "prioritize vulns", or "review the backlog". Runs interactively by default; pass --auto to skip the interview.
vuln-scan
Static source-code vulnerability scan. Reads a target directory (and THREAT_MODEL.md if present), spawns parallel review subagents per focus area, and writes VULN-FINDINGS.json + .md for /triage to consume. Read-only — no building, running, or network. For execution-verified crashes (build + run + sanitizer), see HARNESS.md. Use when asked to "scan for vulns", "review this code for security issues", "find bugs in <dir>", "audit this code for vulnerabilities", or as the step between /threat-model and /triage.
harvester-upgrade
Plan and run a controlled, COMMUNITY-edition Harvester HCI upgrade off an EOL line up to latest stable — the no-skip minor ladder (1.5→1.6→1.7→1.8; embedded RKE2/KubeVirt/Longhorn/SLE-Micro ride along), gated at each hop on first upgrading the EXTERNAL Rancher + a matching Harvester UI-extension (1.6↔Rancher 2.12, 1.7↔2.13, 1.8↔2.14). Covers air-gapped version detection, why node-upgrade order is NOT operator-choosable (forced serial; the pause knob is v1.7.0+ only) and how to protect VM-hosted control planes anyway via anti-affinity spread + N+1 live-migration, making self-managed RKE2 guests Harvester-aware (cloud-provider, CSI, qemu-guest-agent), per-hop breaking changes (wicked→NetworkManager, Intel NIC rename, DHCP IP churn), the enforced pre-flight health gates, and the no-downgrade backup/rollback reality. Companion to k8s-components-checker and rancher-upgrade.
secure-boot-cert-rotation
Triage and remediate the Microsoft Secure Boot 2011→2023 UEFI certificate rotation (CAs expiring June/October 2026) across Dell PowerEdge / iDRAC9 bare metal, Ubuntu/Linux servers, and Harvester HCI / KubeVirt guest VMs. Establishes the load-bearing fact that UEFI firmware ignores certificate expiry — nothing stops booting on the deadline; the real risk is forward-compat once a 2023-only-signed shim arrives, plus a dbx/revocation freeze — then routes to the cleanest per-platform fix: iDRAC BIOS-staged keys applied on reboot (Dell), fwupd-free manual `db` append that self-authenticates via the existing 2011 KEK (Linux), and the Harvester virt-launcher OVMF floor (v1.6.0) with ephemeral-vs-persistent NVRAM triage (VMs). Covers the PK→KEK→db trust chain, why no generic Microsoft 2023 KEK payload exists, and audit via mokutil / efi-readvar / racadm bioscert / Redfish.
jira-cli
Drive Atlassian Jira from the terminal with the `jira` CLI (jira-cli, v1.7.0) against ANY Jira — Cloud or on-premise/Data Center. Covers the full command surface (issue / epic / sprint / board / project / release), the non-interactive automation contract (`--no-input` + `--plain`/`--raw`/`--csv` for agent-safe, parseable output), JQL filtering, GitHub/Jira markdown → Atlassian Document Format (ADF) conversion, authentication for every backend (Cloud API token, on-prem basic, PAT/bearer, mTLS), and live-discovery of instance-specific values (project keys, issue types, statuses, priorities, link types, custom fields) instead of guessing them.
confluence-best-practices
Advise on USING Confluence well, not operating it: make the structural call — is this a space, a page, or a child page? — diagnose why a wiki is a dread (can't find anything, content rots, duplicates, hidden by permissions, unreadable), and recommend the lean fix. Built FIRST for an agent that ACTS on Confluence (creates/organises/governs content via REST/CQL or an MCP server) and SECOND for helping humans author readable pages. Self-hosted Server/Data Center first (storage format NOT ADF; no native page archive; REST v1), but works for Cloud too. Adapt to the org's own space conventions and working language; never auto-translate content. Covers ALL content types — knowledge base, docs, intranet, meeting notes, runbooks, decision records.
jira-best-practices
Advise on USING Jira well, not operating it: make the structural call — is this an epic, a story, a task, or a sub-task? — and diagnose why a Jira is a dread, then recommend the lean fix. Adapt to the organisation's OWN hierarchy names, conventions, and working language instead of imposing a methodology. Self-hosted-first: Jira Data Center 10.3/11.x (no Cloud AI; dual Epic Link + Parent Link). Built for an agent that ACTS on Jira through the jira-cli tool or the mcp-atlassian MCP server while advising the user; Jira web-UI and admin-schema guidance is secondary. Covers ALL project types — software AND non-software (operations, engineering, services, business).
jira-confluence-mcp
Install, configure, secure, and troubleshoot the mcp-atlassian MCP server (sooperset/mcp-atlassian) that connects an agent to Jira/Confluence — including AIR-GAPPED setup (mirror the prebuilt image by digest; no PyPI/git mirror) and internal-CA / TLS handling (mount the CA vs JIRA_SSL_VERIFY=false). Self-hosted Data Center first: the #1 gotcha is DC uses JIRA_PERSONAL_TOKEN (a PAT), NOT the Cloud username+API-token pattern. Covers `claude mcp add`, the env-var catalog, hardening (READ_ONLY_MODE, TOOLSETS/ENABLED_TOOLS, project filters, the v0.22 default-toolset change), Cloud-vs-DC tool/format divergence, and 401/403/field/rate-limit/SSL fixes. NOT a catalogue of the 72 tools — those self-document at runtime; this is the setup/ops knowledge invisible at call time.
gpu-host-tuning
Audit AND tune Linux/GPU inference hosts — read-only host snapshot (CPU power state, C-states, NUMA topology, PCIe link state, GPU settings, kernel boot params, sysctl, ulimits, IRQ affinity, container runtime), optional pinned-host↔GPU memcpy bench (torch + numactl), and per-lever cheat-sheets to flip settings (governor, EPP, cpuidle, persistence, ECC, hugepages, intel_iommu, NCCL env, tuned-adm profiles, Dell/Supermicro/HPE BIOS guidance). Sits beneath any inference framework (vLLM, sglang, TensorRT-LLM) — about the host, not the framework.
nvidia-datacenter-bringup
Bring up NVIDIA HGX/DGX datacenter GPU hosts on Ubuntu 24.04 LTS — air-gapped or connected, Secure Boot enabled. Covers B300/B200/H100/A100/L40S/L4 driver+fabricmanager+NVLSM+DOCA-OFED install order and exact package set from NVIDIA CUDA repo + DOCA repo. Triggers on B300/B200/HGX/DGX install, "fabricmanager won't start", "system not yet initialized" / cudaErrorSystemNotReady, NVLSM missing, ib_umad not loading, DOCA-OFED before NVIDIA driver, nvidia-driver-pinning-XXX, nvlink5-XXX, nvidia-open vs cuda-drivers, "Blackwell requires open kernel modules", ConnectX-7/8 bridge device, FM exact-version-match, gpu-operator cuda-validator CrashLoopBackOff, B300 PCI ID 0x3182, air-gap CUDA + DOCA mirror, three-tier DOCA GPG key, MOK enrollment, DKMS sign, Dell PowerEdge XE9780/XE9785 baseboard firmware v1.4.30, iDRAC Redfish virtual AC cycle DellOemChassis.ExtendedReset, generic "install nvidia driver ubuntu 24.04 datacenter".
transformers-config-tokenizers-expert
Preflight reference for HuggingFace snapshots — what vLLM, sglang, and transformers.generate see at runtime. Covers config-file precedence (tokenizer.json, tokenizer_config.json, generation_config.json, chat_template.jinja), transformers v5 tokenizer-class taxonomy (TokenizersBackend, PythonBackend, MistralCommonBackend, TikTokenTokenizer), special-token discovery (all_special_ids, added_tokens_decoder, extra_special_tokens, backend_tokenizer.get_added_tokens_decoder), chat-template Jinja contract (ImmutableSandboxedEnvironment, loopcontrols, raise_exception, strftime_now, tojson, add_generation_prompt), and engine knobs (skip_special_tokens, trust_request_chat_template, chat_template_kwargs allowlist, adjust_request, incremental detokenizer, EOS merge). Ships verified 2026 hall-of-shame for Kimi-K2.6, GLM-5.1, Gemma-4, Qwen3, DeepSeek-V3, plus drop-in Python for resolving markers to IDs, detecting turn-primer-as-EOS leaks, and cross-referencing tokenizer.json vs tokenizer_config.json.
vllm-deployment
Use this skill when authoring, reviewing, or fixing a vLLM Kubernetes manifest, Docker/Podman pod, or OpenShift ServingRuntime — even when the user does not say "vllm". Triggers on: lab cluster performance practices, cache mount + survival across pod restarts (/root/.cache, VLLM_CACHE_ROOT, TORCHINDUCTOR_CACHE_DIR, TRITON_CACHE_DIR, "do we have caches saved"), HF_TOKEN secret in pod env, liveness + readiness probe tuning (initialDelaySeconds, failureThreshold, "pod takes 12 min to boot"), serve_args review, --enforce-eager rationale, MoE deployment ("ep2 dp2", --enable-expert-parallel, expert-parallel sizing), TP/PP sizing, ConfigMap parser-plugin mount, image tag selection, cold-boot reduction, multi-node LWS + Ray, control planes (llm-d, production-stack, AIBrix, NVIDIA Dynamo, KServe), KEDA autoscaling, GAIE routing, disaggregated prefill/decode (Nixl/Mooncake/LMCache/MORI-IO), RHAIIS on OpenShift (SCC, arbitrary UID, Routes 60s, ModelCar, air-gapped). Lead with operator intent, not vendor names.
baml-expert
BAML (Boundary ML) expert for projects defining LLM calls as typed functions in .baml files with a generated Python client. Use whenever the repo contains baml_src/, baml_client/, baml-cli commands, or imports from baml_py / baml_client. Covers .baml syntax (function, class, enum, client, test, retry_policy, attributes), Python integration (baml_client sync/async, streaming, ClientRegistry, Collector, TypeBuilder), Schema-Aligned Parsing, ctx.output_format, @@assert / @@check tests, @stream.done / @stream.not_null / @stream.with_state streaming, multimodal (image/audio/pdf), and debugging via BAML_LOG plus Boundary Studio. Triggers even unnamed — "add an LLM function", "fix a failing parse", "add a test for the prompt", "stream the response" in a project with baml_src/. Prefer over raw LLM-SDK guidance here; defer to jinja-expert for standalone chat-template / .j2 work.
openshift-app
Package applications for OpenShift deployment: container images (UBI, arbitrary UID, multi-stage builds), packaging formats (Helm, Kustomize, Operators, OLM v1), CI/CD (Tekton, ArgoCD, Shipwright, Conforma), security (SCC, PSA, supply chain, image signing, secrets), operations (Routes, probes, scaling, monitoring, storage), disconnected/air-gapped patterns, and critical gotchas. Also when an app "works on Kubernetes but fails on OpenShift" (SCC denied, random/arbitrary UID, permission errors). Covers OCP 4.14-4.21. NOT for cluster installation or infrastructure management.
netbox-best-practices
NetBox 4.2-4.6 deployment and upgrade knowledge that the official netboxlabs/skills marketplace does NOT cover - use for deploying or upgrading NetBox on Kubernetes with the netbox-community helm chart (netbox-chart), external PostgreSQL/valkey wiring, API token bootstrap on 4.5+ (nbt_ v2 tokens), plugin installation in the official image, version-migration planning between NetBox 4.2 and 4.6, module type profiles, and front/rear port (patch panel) API changes. Trigger on "netbox helm", "netbox chart", "netbox kubernetes", "netbox upgrade", "netbox plugin install", "netbox api token bootstrap", "netbox 4.x breaking changes", or seeding/automation that must survive a NetBox version bump. For general NetBox data modeling, IPAM design, Diode, or validation questions, prefer the official netboxlabs/skills marketplace skills - this skill only covers the gaps.
refactor-check
Activate documentation freshness checks during refactoring. Use when renaming files, moving code between files, splitting/merging files, restructuring subsystems, deleting referenced files, or any work that changes file organization.
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.