evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
398 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover, analyze, and summarize trending GitHub repositories, project health, and technical stacks to stay updated on open-source ecosystems.
Multi-perspective AI consultation for technical architecture, complex refactoring, and structured debugging.
Foundational architectural principles for MoAI-ADK, featuring TRUST 5, SPEC-First TDD, delegation patterns, and token-efficient agent orchestration workflows.
Master React Native styling, navigation, and Reanimated animations. Build performant, cross-platform mobile apps with native-quality UX.
Autonomous multi-agent orchestration framework for Claude Code with memory-driven workflows, parallel-first task execution, Aristotle-based deconstruction, and multi-stage quality gates.
Guided, systematic feature development agent that orchestrates codebase exploration, architectural design, implementation, and automated testing.
Design and implement microinteractions, motion design, and transitions. Use to add UI polish, define loading states, and create delightful, intuitive user feedback patterns.
Executes Gradle-based Java tests, filters results for failures and key statistics, and provides concise reports to streamline backend development and debugging.
Implement production-grade AI agents with LangGraph, tool-calling guardrails, SSE streaming, and episodic memory. Includes anti-patterns, fix pairs, and stateful architecture patterns.
Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers.
Enforces structured self-assessment checkpoints to validate approach, mitigate risks, and ensure quality before, during, and after task execution.