evaluating-code-models
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
192 skills found
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
An automated memory middleware for AI agents, implementing a Retrieve-Respond-Save loop to maintain long-term persistent context across conversations.
A comprehensive Python library for querying, parsing, and analyzing SEC EDGAR filings, financial statements, and institutional holdings as structured data objects.
A comprehensive aphorism and quote management system for thematic content enrichment, research, and newsletter curation.
Intelligent pattern selection for Fabric CLI, automatically choosing from 242+ specialized prompts for threat modeling, data analysis, summarization, and content creation.
SEO-optimized content brief template and creation methodology for writers and content planners.
Reliably read and extract content from publicly shared Google Docs using curl for full document retrieval.
Persistent state management and workflow analytics using DuckDB for task dependency tracking, historical metrics, and context checkpointing.
Convert various documents, media, and web content into Markdown using markitdown, ideal for LLM processing and text analysis.
Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.
6-phase read-only Python analysis workflow that identifies design principle violations, code smells, and modernization opportunities based on specific project types (POC to Open Source).
Fetch real-time financial signals, transmission-chain reasoning, and market confidence metrics directly from the DeepEar Lite platform.