remote-run-ssh

Solid

Run CVlization examples on the `ssh l1` GPU host by copying only the needed example directory plus the shared `cvlization/` package into `/tmp`, then launching the example’s Docker scripts.

DevOps & Infrastructure 359 stars 65 forks Updated today MIT

Install

View on GitHub

Quality Score: 90/100

Stars 20%
85
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Remote Run over SSH Operate CVlization examples on the remote GPU reachable as `ssh l1`. This playbook keeps the remote copy minimal—just the target example folder (e.g., `examples/perception/multimodal_multitask/recipe_analysis_torch`) and the `cvlization/` library—then relies on the example’s own `build.sh` / `train.sh` Docker helpers. The user’s long-lived checkout on the remote stays untouched. ## When to Use - Heavy trainings or evaluations that require CUDA (CIFAR10 speed runs, multimodal pipelines, etc.). - Performance or regression measurements on the remote GPU after local code changes. - Producing reproducible logs / artifacts for discussions or CI baselines without pushing a branch first. ## Prerequisites - Local repo state ready to sync (uncommitted changes acceptable). - SSH config already maps the GPU machine to `l1`. - Remote host provides NVIDIA GPU (currently A10) and Docker with GPU runtime enabled. - At least ~15 GB free under `/tmp` for the slim workspace, Docker context, and caches. - Hugging Face tokens or other creds available locally if the example pulls hub assets. ## Quick Reference 1. Choose the example to run. 2. `rsync` only the example folder, `cvlization/`, and any required helper dirs to `/tmp/cvlization_remote` on `l1`. 3. On `l1`, run `./build.sh` inside the example folder to build the Docker image. 4. Run `./train.sh` (or the example’s equivalent) to launch the job with GPU access. 5. Collect logs / metrics and record the run in `var/s...

Details

Author
majiayu000
Repository
majiayu000/claude-skill-registry
Created
5 months ago
Last Updated
today
Language
HTML
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

run-experiment

Deploy and run ML experiments on local or remote GPU servers. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

11,051 Updated today
wanshuiyin
Code & Development Listed

beam

Use this skill to move an active local coding session and directory onto a new remote machine. This remote machine can also have GPUs.

16 Updated 3 months ago
xeophon
Web & Frontend Listed

remote-mac

Control a remote macOS machine from Linux/VPS via SSH

7 Updated 2 months ago
ythx-101
DevOps & Infrastructure Listed

docker-vps-deploy

Use when deploying a Dockerized application to a VPS (Linux server) via SSH without a container registry, generating a GitHub Actions pipeline that uses docker save, gzip compression, and rsync to transfer images. Triggers: "deploy to VPS", "rsync docker image", "docker save and load", "VPS CI/CD", "SSH deploy pipeline", "deploy without registry", "transfer docker image via SSH".

0 Updated 2 days ago
itsgitz
AI & Automation Listed

vllm-deployment

Use this skill when authoring, reviewing, or fixing a vLLM Kubernetes manifest, Docker/Podman pod, or OpenShift ServingRuntime — even when the user does not say "vllm". Triggers on: lab cluster performance practices, cache mount + survival across pod restarts (/root/.cache, VLLM_CACHE_ROOT, TORCHINDUCTOR_CACHE_DIR, TRITON_CACHE_DIR, "do we have caches saved"), HF_TOKEN secret in pod env, liveness + readiness probe tuning (initialDelaySeconds, failureThreshold, "pod takes 12 min to boot"), serve_args review, --enforce-eager rationale, MoE deployment ("ep2 dp2", --enable-expert-parallel, expert-parallel sizing), TP/PP sizing, ConfigMap parser-plugin mount, image tag selection, cold-boot reduction, multi-node LWS + Ray, control planes (llm-d, production-stack, AIBrix, NVIDIA Dynamo, KServe), KEDA autoscaling, GAIE routing, disaggregated prefill/decode (Nixl/Mooncake/LMCache/MORI-IO), RHAIIS on OpenShift (SCC, arbitrary UID, Routes 60s, ModelCar, air-gapped). Lead with operator intent, not vendor names.

3 Updated today
air-gapped