pytorch-lightning
PyTorch Lightning skill for scalable deep learning: automates model training, multi-GPU orchestration, data pipelines, and distributed training strategies like DDP, FSDP, and DeepSpeed.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
130 skills found
PyTorch Lightning skill for scalable deep learning: automates model training, multi-GPU orchestration, data pipelines, and distributed training strategies like DDP, FSDP, and DeepSpeed.
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
Train and manage neural networks in distributed E2B sandboxes using the Flow Nexus platform, supporting custom architectures like Transformers, LSTMs, and GANs.
Provides resiliency, health monitoring, and fault tolerance utilities for NVIDIA GPU-accelerated distributed applications, including process management and API key handling.
Build stateful AI agents on Cloudflare Workers using the Agents SDK. Features real-time WebSockets, persistent state management, scheduled background tasks, and native tool integration for production-ready deployments.
Production-ready reinforcement learning using Stable Baselines3. Train agents, design custom environments, implement training callbacks, and optimize workflows with a scikit-learn-style API.
Orchestrate multi-agent swarms using agentic-flow for parallel task execution, dynamic topology, and intelligent coordination. Ideal for building distributed AI systems and scaling complex development workflows.
Architect multi-agent systems to overcome context limits, using patterns like supervisor, swarm, and hierarchical models to manage complex workflows.
Parallelize independent debugging or development tasks by delegating to specialized subagents with isolated context.
Orchestrate complex multi-agent swarms with topologies like mesh, hierarchical, and star for research, development, and testing workflows.
Orchestrate complex workflows by coordinating multiple specialized AI agents for multi-perspective code analysis, feature implementation, and system-wide reviews.
P9 Tech Lead mode: Manages P8 agent teams via Task Prompts (six-element) without direct coding. Orchestrates 3+ parallel agents for project management, task decomposition, and architecture.