debug-distributed
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
128 skills found
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
A meta-skill for building robust AI agent skills using a TDD approach: define failure (RED), implement the skill (GREEN), and plug rationalization loopholes (REFACTOR).
Extract and document authentic writing voice from samples. Create comprehensive voice guides for AI training, ghostwriting, and brand consistency.
Classify and group meteorological and environmental variables into specific driver categories for consistent attribution analysis and environmental modeling.
Comprehensive Python healthcare AI toolkit for clinical data processing, medical coding translation, and developing deep learning models like RETAIN and Transformers for EHR, physiological signals, and clinical prediction tasks.
Tools for deploying, managing, and monitoring DataRobot models, including prediction environment configuration, champion/challenger workflows, and deployment operations.
Build comprehensive 3-5 year startup financial models, including revenue projections, cost structures, cash flow analysis, and scenario planning for fundraising and operations.
Comprehensive UI testing, visual fidelity analysis, and browser debugging using Chrome DevTools MCP and AI-driven vision models.
Maintain and update the MassGen model registry, including backend capabilities, model metadata, pricing structures, and context window configurations for new and existing AI models.
A Notion-based tracking system for tweet performance to enable data-driven content experimentation using reinforcement learning principles.
Multi-source research tool for customer inquiries, bug investigations, and account history synthesis with source attribution and confidence scoring.
Audit and synchronize the supported LLM model list in assets.py against the authoritative litellm registry.