evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
131 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
A unified interface for integrating and managing LLM chat providers like OpenAI, Anthropic, Google, Azure, and Bedrock within LangChain applications.
Transcribe audio files (wav, mp3, ogg) to text using the Qwen ASR model. Fast, local-friendly, and requires no API keys.
An AI-powered skill that automatically retrieves relevant project context from your RAG knowledge base for complex coding tasks.
Standardized detective skill integration for agent roles. Maps agents to code-analysis skills and enforces claudemem usage for memory-indexed code investigation.
Fetch real-time financial signals, transmission-chain reasoning, and market confidence metrics directly from the DeepEar Lite platform.
Manage, search, and extract technical insights from a local paper database. Ideal for developers implementing academic research, verifying code against math, and grounding coding agents in scientific papers.
Master multi-agent orchestration with LangGraph. Build stateful, fault-tolerant AI workflows using supervisor-worker patterns, conditional routing, and advanced state management.
Architect production-grade LLM applications using LangChain 1.x and LangGraph. Implement stateful AI agents, multi-step workflows, and custom memory systems for complex conversational and automation tasks.
Advanced Google search using a real, JavaScript-rendered Chrome browser. Ideal for scraping full page content, site-specific queries, and time-filtered results.
Unified AI gateway for 100+ LLMs with OpenAI-compatible API, model fallbacks, load balancing, and enterprise-grade tools.
Build production-grade AI agents using LangGraph, Anthropic/OpenAI/vLLM, and structured outputs. Features streaming, A2A protocol, Pydantic validation, vector memory, and guardrails for resilient, multi-agent workflows.