evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
540 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Expert guidelines for SEO and AEO optimization including EEAT, JSON-LD structured data, technical SEO, and AI-ready content strategies for Google, ChatGPT, and Perplexity.
CLI-only iOS development agent for Swift, SwiftUI, and UIKit. Handles the full lifecycle: build, debug, test, and release without Xcode.
Expert SwiftUI assistant for reviewing, refactoring, and building high-performance, testable, and modern iOS applications using Apple's best practices.
Retrieves Apple platform documentation, Human Interface Guidelines, and WWDC transcripts as Markdown using the Sosumi service.
Provides expert technical guidance for analyzing, parsing, and debugging DWARF format debug information in compiled binaries.
Standardized configuration and management for Django production server and worker processes.
Specialized QA testing agent for morphir-dotnet, covering test plans, regression, E2E verification, bug reporting, and package validation.
Automated, non-destructive proofreading for LaTeX and Quarto lecture files, generating quality reports for grammar, typos, and academic style.
Build comprehensive 3-5 year startup financial models, including revenue projections, cost structures, cash flow analysis, and scenario planning for fundraising and operations.
Fetches expert perspectives from OpenAI Codex and Google Gemini for architecture, code reviews, and debugging, with transparent LLM synthesis.
A comprehensive personal life management system using Todoist for task tracking, Logseq for journaling, and AI-driven insights for productivity.