content-moderation-patterns
SolidContent moderation with Claude: pre-filter vs LLM-classify, categories, thresholds, HITL. Triggers: moderation, safety filter, policy enforcement, content classifier.
Install
Quality Score: 93/100
Skill Content
Details
- Author
- softspark
- Repository
- softspark/ai-toolkit
- Created
- 2 months ago
- Last Updated
- 2 days ago
- Language
- Python
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
content-moderator
AI-powered content moderation with multi-category classification, severity scoring, and policy enforcement. Based on Anthropic's Claude Cookbooks.
llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.
llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.