coreweave-core-workflow-a

Featured

Deploy KServe InferenceService on CoreWeave with autoscaling and GPU scheduling. Use when serving ML models with KServe, configuring scale-to-zero, or deploying production inference endpoints on CoreWeave. Trigger with phrases like "coreweave inference service", "coreweave kserve", "coreweave model serving", "deploy model on coreweave".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# CoreWeave Core Workflow: KServe Inference ## Overview Deploy production inference services on CoreWeave using KServe InferenceService with GPU scheduling, autoscaling, and scale-to-zero. CKS natively integrates with KServe for serverless GPU inference. ## Prerequisites - Completed `coreweave-install-auth` setup - KServe available on your CKS cluster - Model stored in S3, GCS, or HuggingFace ## Instructions ### Step 1: Deploy an InferenceService ```yaml # inference-service.yaml apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: name: llama-inference annotations: autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev" autoscaling.knative.dev/metric: "concurrency" autoscaling.knative.dev/target: "1" autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/maxScale: "5" spec: predictor: minReplicas: 1 maxReplicas: 5 containers: - name: kserve-container image: vllm/vllm-openai:latest args: - "--model" - "meta-llama/Llama-3.1-8B-Instruct" - "--port" - "8080" ports: - containerPort: 8080 protocol: TCP resources: limits: nvidia.com/gpu: "1" memory: 48Gi cpu: "8" requests: nvidia.com/gpu: "1" memory: 32Gi cpu: "4" env: - name: HUGGING_FACE_HUB_TOKEN valueFrom: secretKeyRef: ...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

coreweave-deploy-integration

Deploy inference services on CoreWeave with Helm charts and Kustomize. Use when deploying multi-model inference, managing GPU deployments at scale, or templating CoreWeave manifests. Trigger with phrases like "deploy coreweave", "coreweave helm", "coreweave kustomize", "coreweave deployment patterns".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-hello-world

Deploy a GPU workload on CoreWeave with kubectl. Use when running your first GPU job, testing inference, or verifying CoreWeave cluster access. Trigger with phrases like "coreweave hello world", "coreweave first deploy", "coreweave gpu test", "run on coreweave".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-local-dev-loop

Set up local development workflow for CoreWeave GPU deployments. Use when building containers locally, testing YAML manifests, or iterating on model serving configurations before deploying. Trigger with phrases like "coreweave dev setup", "coreweave local testing", "develop for coreweave", "coreweave container build".

2,266 Updated today
jeremylongshore
AI & Automation Solid

coreweave-prod-checklist

Production readiness checklist for CoreWeave GPU workloads. Use when launching inference services, preparing GPU training for production, or validating deployment configurations. Trigger with phrases like "coreweave production", "coreweave go-live", "coreweave checklist", "coreweave launch".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-install-auth

Configure CoreWeave Kubernetes Service (CKS) access with kubeconfig and API tokens. Use when setting up kubectl access to CoreWeave, configuring CKS clusters, or authenticating with CoreWeave cloud services. Trigger with phrases like "install coreweave", "setup coreweave", "coreweave kubeconfig", "coreweave auth", "connect to coreweave".

2,266 Updated today
jeremylongshore