evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
210 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Analyze and identify codebase patterns (naming, architecture, testing) to maintain consistency and enforce standards during development.
Read and navigate external documentation efficiently using llms.txt, MCP search, and smart parsing strategies.
An autonomous AI agent loop that executes Claude Code repeatedly to build features from structured PRDs until completion.
Implement production-ready AI chat interfaces using OpenAI ChatKit React components. Features include hook configuration, streaming, theming, conversation history, and custom tool integration for Next.js applications.
Generate personalized, verified daily news briefings tailored to your interests, projects, and competitive landscape with strict 7-day source freshness.
An all-in-one Chinese daily utility toolkit: weather, currency exchange, news, and package tracking. Zero configuration, no API keys required.
Extract and document authentic writing voice from samples. Create comprehensive voice guides for AI training, ghostwriting, and brand consistency.
Expert guidance for building production-ready applications with Anthropic's Claude API. Covers SDKs, prompt caching, batch processing, streaming, tool use, and cost optimization strategies.
Queen-led multi-agent orchestration for Claude Code, featuring Byzantine consensus, persistent collective memory, and adaptive task distribution for complex software projects.
Production-ready reinforcement learning using Stable Baselines3. Train agents, design custom environments, implement training callbacks, and optimize workflows with a scikit-learn-style API.
Advanced context engineering system for orchestrating AI agents, memory management, and token optimization to improve long-term persistence and project intelligence.