evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
341 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
AI-assisted version control for code agents. Track prompts, context, and diffs automatically with MemoV to ensure full traceability without polluting your git history.
Architect and optimize production-grade RAG systems. Master embedding models, vector databases, chunking strategies, and retrieval pipelines for high-accuracy LLM applications.
Reverse engineer web APIs by capturing browser traffic (HAR files) and generating production-ready Python API clients for automation and data extraction.
An all-in-one Chinese daily utility toolkit: weather, currency exchange, news, and package tracking. Zero configuration, no API keys required.
Comprehensive API test automation suite supporting REST/GraphQL. Features functional, performance, and contract testing with integrated Mock services.
Automated generation of project documentation from codebase analysis, ensuring accuracy, consistency, and alignment with VilnaCRM architecture patterns.
Implement Google Gemini API vision capabilities for image/document analysis including captioning, object detection, segmentation, and multi-image comparison.
Read and navigate external documentation efficiently using llms.txt, MCP search, and smart parsing strategies.
Structured, template-driven workflow for end-to-end feature development including coding, automated testing, verification, and session-based improvement.
Google Gemini Image Generation API interface for text-to-image, editing, style templates, and automated retry workflows.
Manages free AI models from OpenRouter for OpenClaw. Ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json automatically.