Engineering
evaluation avatar

evaluation

Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.

Installation

Agent type

Claude Code

Install Command (macOS)
curl -fsSL "https://mentalok.io/api/v1/skills/evaluation/install?os=mac&agent=claude" | bash
Install Command (Windows)
curl -L "https://mentalok.io/api/v1/skills/evaluation/install?os=windows&agent=claude" -o install-evaluation.bat && install-evaluation.bat

Download Skill Project

/agent-skill/evaluation