evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
416 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Autonomous multi-agent LinkedIn system using LangGraph and Claude Opus 4.5 for trend research, content creation, voice profiling, and analytics-driven optimization.
Reference for generating Apple Final Cut Pro FCPXML files, covering timeline structures, media assets, effects, and project automation for FCP 10.4+.
A runtime skill discovery engine for AI agents. Search and retrieve specialized agent skills (SKILL.md) on-demand via REST API or MCP to inject procedural knowledge into your agent's context.
Manage project SSOT, memory, and cross-tool search. Guardian of decisions.md and patterns.md for Claude Code. Use for context retention, memory synchronization, and decision tracking.
Anthropic Claude integration patterns: streaming, RAG with pgvector, tool use, model selection (Haiku/Sonnet/Opus), prompt caching, and cost management for AI-powered engineering.
React Native best practices for Expo and bare workflow. Supports project structure, navigation, NativeWind styling, platform-specific code, and TypeScript integration.
Analyze and implement purposeful UI animations for Next.js, Tailwind CSS, and React applications with a focus on UX, performance, and accessibility.
Master iOS Human Interface Guidelines and SwiftUI for native app development. Expert guidance for UI design, component implementation, and Apple platform design principles.
Generates GitHub-compatible Mermaid diagrams with tested color palettes, local SVG/PNG preview, and gist-based rendering support.
Frontend coding conventions for Preact and Tailwind. Use for web UI components in cluster applications.
A suite of professional tools for auditing, evaluating, chunking, and scaffolding production-ready RAG pipelines within Claude Code.