evaluation
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
335 skills found
Build systematic evaluation frameworks for AI agents using multi-dimensional rubrics, LLM-as-a-judge, and regression testing to measure performance, quality, and context engineering effectiveness.
Python toolkit for mass spectrometry data processing. Enables spectral file importing (mzML, MGF, MSP), metadata harmonization, peak filtering, and calculating spectral similarity scores (cosine, modified cosine) for metabolomics.
Diagnose, isolate, and mitigate LLM context failures like lost-in-middle, poisoning, distraction, and context clash to improve agent reliability.
Systematic project technology stack detection, framework-specific skill auto-loading, and multi-stack analysis for fullstack projects like React + Go.
Automates the creation of isolated git worktree environments for parallel feature development and environment setup.
Automates invoice and receipt organization for tax preparation by parsing files, extracting financial data, renaming documents, and filing them into a structured directory system.
Automatically keeps README files synced with codebase changes including dependencies, new features, and configuration updates.
Essential guide to llmemory for document storage and search: installation, database setup with pgvector, document ingestion, hybrid/semantic retrieval, and building RAG systems with multi-tenant support.
Systematically improve marketing copy through a 7-pass editing framework to boost clarity, tone, and conversion impact.
Automated GitHub issue analysis, triage, and resolution planning tool integrated with Specification Driven Development (SDD) workflows.
Research technical documentation and automatically generate ready-to-use software agent skills in markdown format.
Captures session learnings into Reusable Intelligence Infrastructure (RII). Converts one-time bug fixes and pattern discoveries into permanent agent-executable knowledge to prevent recurrence and accelerate future development.