trulens-evaluation-workflow
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
138 skills found
A systematic workflow to instrument, evaluate, and monitor LLM applications using TruLens, supporting frameworks like LangChain, LangGraph, and LlamaIndex.
Intelligently migrate existing brownfield projects to the AgenticDev structure using AI-powered analysis to reorganize documentation, generate rich frontmatter, and preserve git history.
AI language tutor for personalized learning through conversation, grammar lessons, vocabulary drills, and flashcards. Supports 100+ languages including Spanish, French, Japanese, and Mandarin.
Automates the generation and synchronization of localized translation strings for Payload CMS core packages and plugins.
A meta-skill for building robust AI agent skills using a TDD approach: define failure (RED), implement the skill (GREEN), and plug rationalization loopholes (REFACTOR).
Discover and install agent skills to extend your DeerFlow capabilities. Use this to find tools, workflows, or specialized knowledge for tasks like coding, testing, and deployment.
A reinforcement learning-inspired tracker for YouTube performance, using systematic logging to optimize thumbnails, titles, and hooks.
Diagnose, isolate, and mitigate LLM context failures like lost-in-middle, poisoning, distraction, and context clash to improve agent reliability.
Migrate existing OpenAI Apps SDK applications to the MCP Apps SDK, including step-by-step guidance, API mapping tables, and CSP investigation workflows.
Analyze and identify codebase patterns (naming, architecture, testing) to maintain consistency and enforce standards during development.
Evaluate code generation models using BigCode Evaluation Harness. Benchmarks include HumanEval, MBPP, and MultiPL-E with pass@k metrics for multi-language coding models.
Build production-grade RAG systems using vector databases, semantic search, and LangGraph to ground LLMs in external knowledge.