evaluating-cosmos-policy

Solid

Evaluates NVIDIA Cosmos Policy on LIBERO and RoboCasa simulation environments. Use when setting up cosmos-policy for robot manipulation evaluation, running headless GPU evaluations with EGL rendering, or profiling inference latency on cluster or local GPU machines.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Cosmos Policy Evaluation Evaluation workflows for NVIDIA Cosmos Policy on LIBERO and RoboCasa simulation environments from the public `cosmos-policy` repository. Covers blank-machine setup, headless GPU evaluation, and inference profiling. ## Quick start Run a minimal LIBERO evaluation using the official public eval module: ```bash uv run --extra cu128 --group libero --python 3.10 \ python -m cosmos_policy.experiments.robot.libero.run_libero_eval \ --config cosmos_predict2_2b_480p_libero__inference_only \ --ckpt_path nvidia/Cosmos-Policy-LIBERO-Predict2-2B \ --config_file cosmos_policy/config/config.py \ --use_wrist_image True \ --use_proprio True \ --normalize_proprio True \ --unnormalize_actions True \ --dataset_stats_path nvidia/Cosmos-Policy-LIBERO-Predict2-2B/libero_dataset_statistics.json \ --t5_text_embeddings_path nvidia/Cosmos-Policy-LIBERO-Predict2-2B/libero_t5_embeddings.pkl \ --trained_with_image_aug True \ --chunk_size 16 \ --num_open_loop_steps 16 \ --task_suite_name libero_10 \ --num_trials_per_task 1 \ --local_log_dir cosmos_policy/experiments/robot/libero/logs/ \ --seed 195 \ --randomize_seed False \ --deterministic True \ --run_id_note smoke \ --ar_future_prediction False \ --ar_value_prediction False \ --use_jpeg_compression True \ --flip_images True \ --num_denoising_steps_action 5 \ --num_denoising_steps_future_state 1 \ --num_denoising_steps_va...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

DevOps & Infrastructure Solid

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

9,609 Updated 1 months ago

Orchestra-Research

DevOps & Infrastructure Featured

nemo-evaluator-sdk

27,984 Updated today

davila7

AI & Automation Solid

fine-tuning-serving-openpi

Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.

9,609 Updated 1 months ago

Orchestra-Research