Engineering
evaluating-code-models avatar

evaluating-code-models

Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.

Installation

Agent type

Claude Code

Install Command (macOS)
curl -fsSL "https://mentalok.io/api/v1/skills/evaluating-code-models/install?os=mac&agent=claude" | bash
Install Command (Windows)
curl -L "https://mentalok.io/api/v1/skills/evaluating-code-models/install?os=windows&agent=claude" -o install-evaluating-code-models.bat && install-evaluating-code-models.bat

Download Skill Project

/agent-skill/evaluating-code-models