evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
138 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Transforms vague or poorly structured prompts into optimized, high-performance instructions using proven prompt engineering principles for better AI model execution.
Nonlinear optimization toolkit using CasADi and IPOPT. Ideal for building complex NLP models, defining symbolic variables, constraints, and solvers, with specialized support for power systems optimization patterns.
Bayesian modeling and probabilistic programming with PyMC. Build hierarchical models, perform MCMC sampling (NUTS), variational inference, and conduct rigorous model comparison using LOO and WAIC.
An end-to-end video processing pipeline that transforms raw recordings into transcripts, key insights, short clips, and polished articles.
Applies cognitive science frameworks for creative thinking to generate genuinely novel research directions in computer science and AI.
Retrieve current, source-backed technical information using MCP tools to resolve queries about libraries, APIs, SDKs, and evolving tech ecosystems.
Perform internet searches using the Zhipu AI web search API to retrieve real-time information, news, and current data.
Unified local ML inference server for ASR, TTS, Translation, Image Generation, and Vision on Apple Silicon, powered by MLX.
Connect your AI agent to the Hugging Face Hub via MCP. Search models, datasets, and papers, manage repos, run cloud compute jobs, and invoke Gradio Spaces as functional AI tools.
Implement ReasoningBank adaptive learning with AgentDB's ultra-fast vector backend. Features trajectory tracking, verdict judgment, memory distillation, and pattern recognition for self-learning autonomous agents.
Advanced context engineering system for orchestrating AI agents, memory management, and token optimization to improve long-term persistence and project intelligence.