evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
135 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement in AI agents.
Generate high-quality images via a local ComfyUI instance. Perfect for private workflows and professional-grade AI image synthesis.
Resume a paused experimental loop by restoring branch context, loading configuration, reading history, and identifying optimization patterns for continued iteration.
Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production applications.
Automates the submission workflow for lading performance optimizations, including branch management, git commits, and PR creation.
Self-modify your Milady agent by managing plugins. Edit code, rebuild, and restart the runtime to develop new capabilities or improve agent workflows locally.
Transforms content to match specific voice profiles, tones, or styles using configurable YAML templates for consistent brand and narrative output.
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
Expert guidance for building production-ready applications with Anthropic's Claude API. Covers SDKs, prompt caching, batch processing, streaming, tool use, and cost optimization strategies.
Generate high-quality visual content, characters, and scenes using structured JSON prompts and automated Python execution for guided image synthesis.
A suite of professional tools for auditing, evaluating, chunking, and scaffolding production-ready RAG pipelines within Claude Code.