evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
292 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Fetch and parse transcripts from YouTube and Bilibili videos for summarization, QA, and content extraction using yt-dlp.
Manage client relationships, track follow-ups, and automate personalized email drafts using Obsidian-based client profiles.
Private skill distribution system for managing agentics across devices and teams. Install, sync, add, and update your agents, skills, and prompts via a central library catalog.
Reliably rotate images by 90-degree increments using a deterministic Python script. Supports PNG, JPG, GIF, BMP, and TIFF, preserving quality with automated file handling.
Dialectical reasoning and adversarial coding agent for MCP-enabled editors, forcing LLMs to resolve internal contradictions for higher quality outputs.
Comprehensive AI-generated text detection framework. Features multi-layer analysis of vocabulary, structural patterns, model-specific fingerprints, and technical metadata artifacts to identify AI authorship.
A high-performance Liquid template engine that compiles templates into optimized Ruby and machine code via an intermediate language (IL).
Architectural governance and project standards for React 19 SPA development, ensuring consistency in stack integration, project structure, and agent execution rules.
Self-maintaining skill for OpenCode agents to update documentation, capture learnings, and extend tool/agent capabilities dynamically.
One-click publishing of Markdown articles to WeChat Official Account drafts, featuring automated image hosting, multi-theme styling, and code syntax highlighting.
Fetches expert perspectives from OpenAI Codex and Google Gemini for architecture, code reviews, and debugging, with transparent LLM synthesis.