evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
94 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Automate Android device operations using AI AutoGLM Phone Agent. Enables natural language control for app testing, data collection, and UI interactions like tapping, scrolling, and inputting text.
Standardizes markdown content with active voice, precise heading hierarchies, and WCAG AA accessibility compliance for documentation, web sites, and repository files.
An expert-level CTF solver agent that automates reconnaissance, vulnerability analysis, and exploit generation for web, pwn, crypto, reverse, and forensic challenges.
Fetch and parse Feishu (Lark) cloud documents into Markdown, with support for media handling and Wiki space navigation.
Generate and edit images using Google's Nano Banana 2 via WaveSpeed AI. Supports text-to-image, natural language editing, multi-image composition, 4K resolution, and various aspect ratios.
macOS visual automation tool for precise window capture, video recording, UI mockup annotation, Excalidraw wireframing, and automated visual regression testing.
Automate high-quality screenshot generation for MicroSim visualizations using Chrome headless mode. Ideal for documentation, social media previews, and quality assessment.
Perform a structured 8-factor conversion rate optimization (CRO) audit of any landing page to identify friction points and opportunities for growth.
A comprehensive moderation toolkit for Civitai, providing automated user management, strike systems, image review, content regulation, and CSAM reporting via tRPC API.
Create aesthetically beautiful interfaces using systematic design principles, AI-driven evaluation, and automated inspiration analysis.
Advanced visual regression testing with pixel-perfect and AI-powered diff analysis, cross-browser validation, and responsive design checks to prevent UI regressions in CI/CD pipelines.