evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
504 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Perform comprehensive trading comparables analysis using peer multiples, operational KPIs, and valuation modeling to assess company relative value.
Master workflow controller for Lovable-style, AI-driven development. Instantly generates premium, multi-page, animated applications by routing to specialized sub-agents. No prompts needed—just build.
Conduct systematic literature reviews across PubMed, arXiv, and Semantic Scholar with AI-driven synthesis, verified citations, and mandatory schematic visualization.
Comprehensive UI testing, visual fidelity analysis, and browser debugging using Chrome DevTools MCP and AI-driven vision models.
Implement production-grade AI agents with LangGraph, tool-calling guardrails, SSE streaming, and episodic memory. Includes anti-patterns, fix pairs, and stateful architecture patterns.
Bootstrap CISO Assistant environments by guiding users through organizational structure setup, framework selection, and initial risk assessment configuration using MCP tools.
Generate publication-quality figures, charts, and LaTeX tables from experiment data for academic papers.
CLI interface for Gemini AI, enabling one-shot model inference, text generation, and JSON-formatted data extraction for OpenClaw users.
Sage MCP protocol implementation for integrating external tool servers and standardized AI model context.
A comprehensive configuration toolkit for Claude Code featuring battle-tested agents, skills, hooks, and automation workflows for software development.
Build comprehensive 3-5 year startup financial models, including revenue projections, cost structures, cash flow analysis, and scenario planning for fundraising and operations.