debug-cuda-crash
Tutorial for identifying and resolving CUDA runtime crashes using FlashInfer's API logging framework.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
171 skills found
Tutorial for identifying and resolving CUDA runtime crashes using FlashInfer's API logging framework.
Unified local ML inference server for ASR, TTS, Translation, Image Generation, and Vision on Apple Silicon, powered by MLX.
Build RAG systems to ground LLMs in proprietary data. Includes vector database integration, embedding strategies, hybrid search, and advanced retrieval patterns for FastAPI backends.
Fast-reference guide and utility skill for Helm chart development, template syntax, and Kubernetes application deployment.
Detects timing side channels in cryptographic code to prevent secret data leakage. Essential for auditing sensitive implementations.
Provides resiliency, health monitoring, and fault tolerance utilities for NVIDIA GPU-accelerated distributed applications, including process management and API key handling.
Expert prompt engineering for Jimeng Seedance 2.0. Generate high-quality multimodal video prompts using text, images, video references, and @ syntax for camera control, rhythm, and VFX.
Implement LlamaExtract for robust structured data extraction from PDF, DOCX, and PPTX files using Pydantic schemas.
Process massive files and large codebases (10M+ tokens) by recursively chunking, sub-querying, and aggregating results to overcome LLM context limits.
Build production-grade RAG systems using vector databases, semantic search, and LangGraph to ground LLMs in external knowledge.
Guided statistical analysis with test selection, assumption checking, power analysis, and APA-formatted reporting for academic and experimental research.
Transform AI agents into proactive partners using WAL Protocol, persistent memory buffers, and autonomous cron scheduling to anticipate needs and improve performance.