evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
234 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Generate a structured academic paper outline from research narrative, experiment data, and review conclusions.
P9 Tech Lead mode: Manages P8 agent teams via Task Prompts (six-element) without direct coding. Orchestrates 3+ parallel agents for project management, task decomposition, and architecture.
Systematically extract insights, decisions, and constraints from research documents, technical papers, and architectural design files.
Intelligent orchestration for dispatching tasks to specialized background agents with performance-based routing and execution tracking.
Expert consultant for designing and building high-quality, consistent AI agent skills. Guides you through discovery, architecture, and creation phases to ensure reliable, composable, and efficient skill delivery.
Automated WeChat article writing workflow including web research, viral title generation, drafting, and professional layout optimization.
Guided statistical analysis with test selection, assumption checking, power analysis, and APA-formatted reporting for academic and experimental research.
Master KPI dashboard design with proven metrics frameworks, SMART goals, and hierarchy patterns to drive business performance from executive insights to operational monitoring.
AI-driven GitHub Actions automation featuring swarm-based workflow orchestration, intelligent CI/CD pipeline management, and autonomous repository maintenance.
BLS periodogram tool for detecting transiting exoplanets and eclipsing binaries in photometric light curves. An astropy-based implementation for period, duration, and depth analysis.
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.