evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
493 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Advanced workflow orchestration for AI agents, featuring multi-model routing, Codex sandbox iteration, parallel swarm execution, and persistent memory across complex pipelines.
Full-stack automated paper writing pipeline from research narrative to polished LaTeX/PDF.
Cascading goal tracking system connecting 3-year vision to daily tasks. Automates progress calculation, stalled goal detection, and project-to-goal alignment for Obsidian vaults.
Fetch and aggregate the latest Posit news, blog posts, podcast episodes, video content, and event announcements using automated sub-agents.
Automated global intelligence aggregator for market, geopolitical, and AI news. Features RSS feed integration, real-time alert systems for critical events, and structured report generation with intelligence inference.
Enforce strict code quality, correctness, and Rust design patterns for the Turso database, prioritizing data integrity, performance, and maintainable, idiomatic code.
Initializes a development session with environmental health checks, task status synchronization, and contextual memory restoration for Claude Code.
Multi-model LLM integration patterns for Claude, GPT, Gemini, and Ollama. Features API handling, prompt engineering, token management, and model-agnostic orchestration.
Manage, search, and extract technical insights from a local paper database. Ideal for developers implementing academic research, verifying code against math, and grounding coding agents in scientific papers.
Fast-reference guide and utility skill for Helm chart development, template syntax, and Kubernetes application deployment.
Create curated news, tech, and research paper digests using high-quality sources. Perfect for daily roundups, topic tracking, and filtering noise.