ai-llm-patterns
Anthropic Claude integration patterns: streaming, RAG with pgvector, tool use, model selection (Haiku/Sonnet/Opus), prompt caching, and cost management for AI-powered engineering.
Introduction
This skill provides a robust framework for integrating Anthropic Claude into complex software systems. Designed for AI engineers and fullstack developers, it focuses on production-ready patterns for building scalable RAG pipelines, autonomous agents, and cost-effective LLM features. It emphasizes the balance between performance, user experience, and operational efficiency through technical architectural best practices.
-
Advanced RAG Architecture: Implementation of chunking strategies, vector search using pgvector with cosine similarity, and embedding pipelines leveraging text-embedding-3-small.
-
Anthropic SDK & Streaming: Best practices for implementing Server-Sent Events (SSE) streaming to reduce perceived latency and improve real-time user feedback.
-
Strategic Model Selection: A decision-making framework for selecting between Haiku, Sonnet, and Opus based on specific task requirements, latency, and throughput costs.
-
Tool Use & Agent Loops: Designing secure function-calling interfaces where the LLM orchestrates operations while maintaining safe boundary control for database writes and sensitive actions.
-
Context Optimization: Implementing prompt caching for frequently accessed documents, large system prompts, and RAG context windows to optimize token spend and responsiveness.
-
Structured Data Extraction: Utilizing Zod for schema enforcement, ensuring LLM outputs are deterministic and safe for programmatic consumption.
-
Use for building production-grade AI features, document retrieval systems, or autonomous agent workflows.
-
Follow the core constraint: never trust LLM outputs directly for database mutations; always implement deterministic validation.
-
Input requirements include target document datasets and function schemas; output typically results in optimized API responses, retrieved context chunks, or tool execution plans.
-
Adhere to token budget management; always cache prompts exceeding 1024 tokens to maximize efficiency.
-
Refer to the provided documentation in references/ for specific implementation guides on SSE, RAG pipelines, and LLM-ops error handling.
Repository Stats
- Stars
- 11
- Forks
- 1
- Open Issues
- 1
- Language
- Shell
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 4, 2026, 12:58 AM