evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
277 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Sends debugging data, logs, and visual output to the Ray desktop application via its local API for real-time developer feedback.
Epistemic safety analysis for JSON data in prompts to prevent LLM hallucinations and reasoning errors when handling incomplete or large-scale datasets.
Comprehensive UI testing, visual fidelity analysis, and browser debugging using Chrome DevTools MCP and AI-driven vision models.
Manage automatic model routing for Higress AI Gateway via CLI. Configure triggers for intelligent model selection based on request content.
Perform network protocol reverse engineering, including packet capture, traffic analysis, protocol dissection, and custom format documentation.
Implement production-grade observability for Istio and Linkerd service meshes, including distributed tracing, metric dashboards, and golden signal monitoring.
Base ecosystem skill for Refly. Creates, discovers, and runs domain-specific skills, routes user intent to workflows via symlinks, and automates multi-step pipelines via the Refly CLI.
Automate regulatory compliance testing for GDPR, CCPA, HIPAA, SOC2, and PCI-DSS to ensure legal adherence, prepare for audits, and secure sensitive data.
Directly interface with RagCode MCP via SSE protocol without complex configuration files or binary dependencies.
Expert automated code review for Go CLI applications, focusing on Cobra/urfave patterns, security, performance, idiomatic Go, and robust error handling.
Manage database orchestration sessions, state snapshots, and system-level operations for the BAZINGA-DB core engine.