Agent Skills Hub

Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.

Clear

123 skills found

EngineeringResearch
evaluating-code-models avatar

evaluating-code-models

Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.

Views: 197,624#Evaluation#Code Generation#HumanEval#MBPP
EngineeringData AnalysisResearch
evaluation avatar

evaluation

Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.

Views: 2315,339
EngineeringAutomation
eval avatar

eval

Evaluate Deca agent prompts and behavioral consistency through automated test runners, manual LLM judgment, and structured reporting.

Views: 171
EngineeringData AnalysisAutomation
trulens-evaluation-workflow avatar

trulens-evaluation-workflow

A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.

Views: 113,286#trulens#llm#evaluation#workflow
Data AnalysisResearchEngineering
pymc avatar

pymc

Bayesian modeling and probabilistic programming with PyMC. Build hierarchical models, perform MCMC sampling (NUTS), variational inference, and conduct rigorous model comparison using LOO and WAIC.

Views: 819,798
Data AnalysisResearchEngineering
statsmodels avatar

statsmodels

Statistical modeling and econometrics library for Python. Performs OLS, GLM, mixed models, ARIMA, diagnostics, and inference for rigorous scientific analysis.

Views: 1119,783
Data AnalysisEngineeringResearch
scikit-learn avatar

scikit-learn

Classical machine learning with scikit-learn. Use for classification, regression, clustering, dimensionality reduction, preprocessing, model evaluation, and building robust ML pipelines in Python.

Views: 719,694
ContentResearchProductivity
generate-image avatar

generate-image

Generate or edit images using AI models like FLUX and Gemini. Ideal for photos, illustrations, concept art, and visual assets, excluding technical diagrams and schematics.

Views: 411,655
ResearchEducationProductivity
scholar-evaluation avatar

scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured, quantitative, and qualitative assessment across research quality dimensions with actionable feedback.

Views: 819,706
Data AnalysisProductivityEngineering
creating-financial-models avatar

creating-financial-models

A comprehensive financial modeling suite for investment analysis, featuring DCF valuation, sensitivity testing, Monte Carlo simulations, and scenario planning.

Views: 7709
EngineeringAutomation
eval-harness avatar

eval-harness

Official evaluation framework for AI agent sessions, implementing Evaluation-Driven Development (EDD) principles to ensure reliability.

Views: 30169,888
EngineeringAutomationData Analysis
claude-rag-skills avatar

claude-rag-skills

A suite of professional tools for auditing, evaluating, chunking, and scaffolding production-ready RAG pipelines within Claude Code.

Views: 2631