evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
489 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
SPARC methodology for multi-agent development: systematic Specification, Pseudocode, Architecture, Refinement, and Completion workflows via Claude Flow orchestration.
A Notion-based tracking system for tweet performance to enable data-driven content experimentation using reinforcement learning principles.
Build, audit, and iterate high-converting landing pages using React, Vite, TypeScript, Tailwind, and shadcn/ui. Expert in CRO, hero structures, and conversion-focused design with Iconify icons.
Guidance and operational tips for identifying, reviewing, and managing pull requests created by the GitHub Copilot coding agent within your repository.
A runtime skill discovery engine for AI agents. Search and retrieve specialized agent skills (SKILL.md) on-demand via REST API or MCP to inject procedural knowledge into your agent's context.
Create high-converting email sequences for sales, launches, and lead nurturing. Expertly crafted drip campaigns tailored to your business voice, audience, and offer goals.
Master workflow controller for Lovable-style, AI-driven development. Instantly generates premium, multi-page, animated applications by routing to specialized sub-agents. No prompts needed—just build.
Automates the release preparation process for MassGen by generating CHANGELOG entries, creating announcement drafts, and validating documentation integrity before git tagging.
A structured workflow for co-authoring documentation, technical specs, and proposals, guiding users through context gathering, collaborative refinement, and reader verification.
An automated memory middleware for AI agents, implementing a Retrieve-Respond-Save loop to maintain long-term persistent context across conversations.
Enhance fuzzer effectiveness by providing domain-specific tokens, magic bytes, and protocol-specific keywords to reach deep code paths.