Engineering
evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Installation
Agent type
Claude Code
Install Command (macOS)
curl -fsSL "https://mentalok.io/api/v1/skills/evaluating-code-models/install?os=mac&agent=claude" | bash
Install Command (Windows)
curl -L "https://mentalok.io/api/v1/skills/evaluating-code-models/install?os=windows&agent=claude" -o install-evaluating-code-models.bat && install-evaluating-code-models.bat
Download Installer
Download Skill Project