ar0cket1
UserOnline RL for Hermes Agent — self-improving LoRA adapters from human feedback using MIS-PO
1 indexed · 0 Featured · 13 stars · avg score 84
Categories
Indexed Skills (1)
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.