Engineering
evaluating-code-models avatar

evaluating-code-models

Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.

Daily Activity

Views and downloads trend for the last 30 days.

DateViewsDownloads
Jun 1401
Jun 1300
Jun 1200
Jun 1100
Jun 1000
Jun 910
Jun 800
Jun 700
Jun 600
Jun 500
Jun 400
Jun 300
Jun 200
Jun 100
May 3100
May 3020
May 2900
May 2800
May 2710
May 2600
May 2500
May 2400
May 2300
May 2240
May 2100
May 2000
May 1900
May 1801
May 1710
May 1600