Engineering
advanced-evaluation
Implement production-grade LLM-as-a-judge pipelines for model evaluation, including pairwise comparison, direct scoring, bias mitigation, and rubric generation.
Installation
Agent type
Claude Code
Install Command (macOS)
curl -fsSL "https://mentalok.io/api/v1/skills/advanced-evaluation/install?os=mac&agent=claude" | bash
Install Command (Windows)
curl -L "https://mentalok.io/api/v1/skills/advanced-evaluation/install?os=windows&agent=claude" -o install-advanced-evaluation.bat && install-advanced-evaluation.bat
Download Installer
Download Skill Project