evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
172 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Orchestrates complex programming tasks by analyzing available skills, generating structured execution plans, and managing manual or delegated multi-step workflows.
Generate structured development plans, checklists, and file contexts compatible with the IntelliJ coding-aider plugin.
Discover, analyze, and summarize trending GitHub repositories, project health, and technical stacks to stay updated on open-source ecosystems.
Perform comprehensive technical analysis for stocks and ETFs using indicators like RSI, MACD, and Bollinger Bands to generate actionable trading signals and comparative reports.
Development CLI for the Multigres project: automate unit tests, integration tests, and environment coordination for Vitess-for-Postgres.
Aggressively prune grammatical scaffolding and filler text from inputs to optimize LLM token usage while retaining core semantic content.
Pre-execution security guardrails for AI agents. Validates shell commands and file reads against 400+ security patterns to block destructive operations, credential theft, and unauthorized system access.
Universal CLI tool to convert and synchronize AI agent skills between Claude Code and Gemini CLI extensions.
An advanced development guide for Claude Code, covering REPL environments, MCP integration, development workflows, and best practices for AI-assisted coding.
Shared memory and collaboration layer for AI coding agents to track actions, manage sessions, detect conflicts, and preserve project context across tools.
Generate a structured academic paper outline from research narrative, experiment data, and review conclusions.