evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
460 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Automates the lifecycle management of ephemeral Neon PostgreSQL databases for testing, CI/CD, and rapid prototyping workflows.
Connect to the Notion API to create, manage, and query pages, databases, and blocks for your AI-powered knowledge management.
Generates a random lucky number between 0 and 9999 for games, decision-making, or entertainment.
Process massive files and large codebases (10M+ tokens) by recursively chunking, sub-querying, and aggregating results to overcome LLM context limits.
Structured problem-framing tool for design sprints and product strategy. Facilitates collaborative or individual sessions to define goals, stakeholders, constraints, and pain points before solution generation.
Manage AWS Lambda serverless functions: deploy code, configure event triggers, debug invocations, optimize cold starts, and maintain layers.
Build no-code MCP servers that orchestrate tools as directed graphs using YAML for data transformation, conditional routing, and automated workflows.
Implementation patterns for MERIDIAN autonomous AI agents using Claude API, including BaseAgent lifecycle, structured tool use, token budget enforcement, and cron scheduling.
Optimize Node.js performance via Redis caching, clustering, profiling, and monitoring to build fast, scalable, and efficient backend services.
Perform comprehensive code reviews with a focus on security vulnerabilities, performance optimization, maintainability, and code correctness.
Multi-model LLM integration patterns for Claude, GPT, Gemini, and Ollama. Features API handling, prompt engineering, token management, and model-agnostic orchestration.