trulens-evaluation-workflow
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
590 skills found
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Generate professional equity research snapshots using consensus estimates, company fundamentals, historical pricing, and macroeconomic indicators to build investment theses.
Framework for multi-agent collaboration using the Google A2A protocol. Enables messaging, task delegation, and cross-agent coordination for CLI-based AI tools.
Drafts LaTeX research papers section-by-section using paper plans and research narratives with multi-model reviewer validation.
Development guide for Arma Reforger EnforceScript, covering component architecture, network replication, persistence, and memory management.
Generate production-ready Cloudscape Design System React + TypeScript UI code, components, and scaffolds with accessibility, responsive patterns, and robust state handling.
Explains complex concepts using master teaching frameworks like Feynman, Socratic, and Cognitive Load theory to ensure deep, clear understanding.
React component development guide for LobeHub, including styling with antd-style, layout building with @lobehub/ui, and routing management.
Queen-led multi-agent orchestration for Claude Code, featuring Byzantine consensus, persistent collective memory, and adaptive task distribution for complex software projects.
GitHub workflow assistant with integrated git and gh CLI support for managing repositories, branches, pull requests, and issues.
A robust verification and QA system for software agents featuring real-time truth scoring, automated code validation, and instant rollback capabilities to maintain high reliability.
A systematic code auditing framework for identifying technical debt, security vulnerabilities, dead code, and code quality issues in software projects.