trulens-evaluation-workflow
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
136 skills found
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Search, analyze, and audit GeminiClaw session logs and memory. Use to investigate past interactions, track token usage, debug tool calls, and monitor agent performance.
Advanced QE reporting, quality dashboards, and predictive analytics for test metrics, code coverage, and deployment readiness to drive data-informed quality decisions.
Manage, run, and update JS framework benchmarks for the Gea framework, including reporting, HTML result generation, and performance comparisons.
Orchestrate parallel Claude Code worker swarms with protocol-based behavioral governance for complex features, multi-step refactors, and long-running autonomous coding sessions.
Evidence-based debugging for Python, Node.js, and Java applications using runtime execution traces and diagnostic MCP tools.
Privacy-preserving transactions on Base using Veil Cash. Deposit into shielded pools, perform ZK-based withdrawals/transfers, and manage private balances. Supports ETH/USDC via local ZK proofs and Bankr-signed deposits.
Decision framework for choosing between MCP tools and direct API skills to optimize agent performance, cost, and efficiency.
Automate quality observability with DORA metrics, defect density tracking, and intelligent quality gate configuration for continuous delivery pipelines.
React and Vite performance optimization guidelines. Use when writing, reviewing, or optimizing React components built with Vite.
Virtual machine development expert focusing on bytecode design, stack-based/register-based VM implementation, memory management, and garbage collection.
Elasticsearch DBA skill for cluster architecture, mapping design, performance tuning, and production operations including ILM, shard strategy, and troubleshooting.