castai-common-errors

Featured

Diagnose and fix CAST AI agent, API, and autoscaler errors. Use when the CAST AI agent is offline, nodes are not scaling, or API calls return errors. Trigger with phrases like "cast ai error", "cast ai not working", "cast ai agent offline", "cast ai debug", "fix cast ai".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# CAST AI Common Errors ## Overview Diagnostic guide for the 10 most common CAST AI issues, covering agent connectivity, API errors, autoscaler failures, and node provisioning problems. ## Prerequisites - `kubectl` access to the cluster - `CASTAI_API_KEY` configured - Access to CAST AI console for log correlation ## Error Reference ### 1. Agent Pod CrashLoopBackOff ```bash kubectl get pods -n castai-agent kubectl logs -n castai-agent deployment/castai-agent --tail=50 ``` **Causes and fixes:** - **Invalid API key**: Regenerate at console.cast.ai > API - **Wrong provider**: Set `--set provider=eks|gke|aks` correctly in Helm - **RBAC missing**: Apply the required ClusterRole and ClusterRoleBinding - **Network blocked**: Ensure outbound HTTPS to `api.cast.ai` is allowed ### 2. Agent Shows "Disconnected" in Console ```bash # Check agent heartbeat kubectl logs -n castai-agent deployment/castai-agent | grep -i "heartbeat\|connect\|error" # Verify network connectivity from inside the cluster kubectl run castai-debug --image=curlimages/curl --rm -it --restart=Never -- \ curl -s -o /dev/null -w "%{http_code}" https://api.cast.ai/v1/kubernetes/external-clusters ``` **Fix**: Restart the agent pod: `kubectl rollout restart deployment/castai-agent -n castai-agent` ### 3. API Returns 401 Unauthorized ```bash # Test API key curl -s -o /dev/null -w "%{http_code}" \ -H "X-API-Key: ${CASTAI_API_KEY}" \ https://api.cast.ai/v1/kubernetes/external-clusters # Should return 200, ...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

castai-install-auth

Install and configure CAST AI agent on a Kubernetes cluster with API key authentication. Use when onboarding a cluster to CAST AI, setting up Helm charts, or configuring Terraform provider authentication. Trigger with phrases like "install cast ai", "connect cluster to cast ai", "cast ai setup", "cast ai api key", "cast ai helm install".

2,266 Updated today
jeremylongshore
AI & Automation Featured

castai-security-basics

Secure CAST AI API keys, RBAC configuration, and Kvisor security agent. Use when hardening CAST AI cluster access, configuring security scanning, or implementing API key rotation procedures. Trigger with phrases like "cast ai security", "cast ai api key rotation", "cast ai rbac", "cast ai kvisor", "secure cast ai".

2,266 Updated today
jeremylongshore
AI & Automation Featured

castai-prod-checklist

Production readiness checklist for CAST AI cluster onboarding. Use when going live with CAST AI autoscaling, validating Phase 2 setup, or preparing for production cost optimization. Trigger with phrases like "cast ai production", "cast ai go-live", "cast ai checklist", "cast ai launch".

2,266 Updated today
jeremylongshore
AI & Automation Featured

castai-upgrade-migration

Upgrade CAST AI Helm charts, Terraform provider, and agent components. Use when upgrading CAST AI versions, checking for breaking changes, or migrating between CAST AI agent releases. Trigger with phrases like "upgrade cast ai", "update cast ai agent", "cast ai helm upgrade", "cast ai terraform upgrade".

2,266 Updated today
jeremylongshore
AI & Automation Featured

castai-performance-tuning

Optimize CAST AI autoscaler performance, node provisioning speed, and API efficiency. Use when nodes take too long to provision, autoscaler is not reacting fast enough, or optimizing API call patterns for multi-cluster dashboards. Trigger with phrases like "cast ai performance", "cast ai slow", "cast ai node provisioning", "cast ai autoscaler speed".

2,266 Updated today
jeremylongshore