Agent Skills Hub

Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.

137 skills found

evaluation

Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.

Views: 23★ 15,339

EngineeringAutomation

eval

Evaluate Deca agent prompts and behavioral consistency through automated test runners, manual LLM judgment, and structured reporting.

Views: 17★ 1

ProductivityContentEducation

prompt-rewriter

Advanced prompt rewriting and optimization service. Analyzes prompts for clarity, specificity, and structure, providing actionable improvements, variations for testing, and prompt engineering best practices.

Views: 20★ 4,453

EngineeringAutomation

eval-harness

Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.

Views: 30★ 169,888

ResearchEducationContent

peer-review

Structured manuscript and grant review assistant utilizing checklist-based evaluation for methodology, statistical validity, and compliance with reporting standards like CONSORT and STROBE.

Views: 27★ 19,688

ResearchEducationProductivity

scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured, quantitative, and qualitative assessment across research quality dimensions with actionable feedback.

Views: 8★ 19,706

EngineeringData AnalysisAutomation

trulens-evaluation-workflow

A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.

Views: 11★ 3,286#trulens#llm#evaluation#workflow

ResearchContentEngineering

ai-writing-detection

Comprehensive AI-generated text detection framework. Features multi-layer analysis of vocabulary, structural patterns, model-specific fingerprints, and technical metadata artifacts to identify AI authorship.

Views: 12★ 1,108

EngineeringProductivity

context-compression

Optimize agent performance and token usage through advanced context compression, structured summarization, and task-oriented state management for long-running sessions.

EngineeringResearch

evaluating-code-models

Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.

Views: 19★ 7,624#Evaluation#Code Generation#HumanEval#MBPP

ProductivityEngineeringData AnalysisContentResearch

ai-multimodal

Process and generate multimedia with Google Gemini. Analyze audio, images, videos, and PDFs with high-context windows. Supports transcription, visual QA, OCR, and AI-driven image creation.

Views: 14★ 9