debug-distributed
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
163 skills found
Debugging guide for AReaL distributed training issues, including hangs, NCCL errors, OOM, and numerical consistency in FSDP2/TP/CP/EP.
Expert technical support for the Litestream disaster recovery tool, covering WAL monitoring, LTX replication, cloud storage backends, and SQLite page management.
Design and implement robust, scalable event stores for event-sourced systems, covering architectural patterns, technology selection, and persistence strategies.
Orchestrate multi-agent swarms using agentic-flow for parallel task execution, dynamic topology, and intelligent coordination. Ideal for building distributed AI systems and scaling complex development workflows.
Implementation patterns for MERIDIAN autonomous AI agents using Claude API, including BaseAgent lifecycle, structured tool use, token budget enforcement, and cron scheduling.
Development guide for lemline-core, the stateless Serverless Workflow engine. Manage workflow execution, node navigation, state transitions, JQ expression evaluation, error handling, and parallel fork logic.
Build stateful AI agents on Cloudflare Workers using the Agents SDK. Features real-time WebSockets, persistent state management, scheduled background tasks, and native tool integration for production-ready deployments.
Directly interface with RagCode MCP via SSE protocol without complex configuration files or binary dependencies.
Parallelize independent debugging or development tasks by delegating to specialized subagents with isolated context.
Development guide for Arma Reforger EnforceScript, covering component architecture, network replication, persistence, and memory management.
Expert guidance for Django asynchronous task processing with Celery. Best practices for task design, worker configuration, error handling, periodic tasks, and production monitoring.
Enforce strict UI adherence to your project's design system tokens, components, and layout patterns for consistent frontend implementation.