evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
158 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Extract and document authentic writing voice from samples. Create comprehensive voice guides for AI training, ghostwriting, and brand consistency.
Retrieve current weather conditions, forecasts, and temperature data for global locations to assist with daily planning and travel.
Expert guidance for Claude Messages API: structured outputs, prompt caching, tool use, and migration from deprecated Claude 3.x models to 4.5. Prevents common API errors.
Perform advanced video analysis using Google's Gemini API: summarize content, transcribe audio, extract timestamps, clip segments, and analyze YouTube URLs or local files with support for multiple models and long contexts.
Enforce epistemic quality in RAG systems with pre-ingestion verification. Ensures documents are properly qualified and structured before knowledge base entry.
Intelligent Apple Mail inbox scanner that categorizes unread, actionable, and priority emails using automated keyword analysis.
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
Automated text-to-image rendering engine for social media posts, article covers, and long-form threads. Supports X-style, WeChat, and poster templates with high-precision text formatting and highlights.
Advanced Python security vulnerability scanner for Flask, Django, and FastAPI projects. Audits OWASP Top 10, dependencies, hardcoded secrets, and framework-specific flaws.
Analyze meeting transcripts to uncover behavioral patterns, communication insights, and leadership feedback. Identify conflict avoidance, filler words, speaking ratios, and active listening to improve your professional presence.
Streamline technical documentation by generating, updating, and refining README files. Tailors content for specific audiences including OSS contributors, internal teams, and personal projects.