Engineering
ai-llm-patterns avatar

ai-llm-patterns

Anthropic Claude integration patterns: streaming, RAG with pgvector, tool use, model selection (Haiku/Sonnet/Opus), prompt caching, and cost management for AI-powered engineering.

Introduction

This skill provides a robust framework for integrating Anthropic Claude into complex software systems. Designed for AI engineers and fullstack developers, it focuses on production-ready patterns for building scalable RAG pipelines, autonomous agents, and cost-effective LLM features. It emphasizes the balance between performance, user experience, and operational efficiency through technical architectural best practices.

  • Advanced RAG Architecture: Implementation of chunking strategies, vector search using pgvector with cosine similarity, and embedding pipelines leveraging text-embedding-3-small.

  • Anthropic SDK & Streaming: Best practices for implementing Server-Sent Events (SSE) streaming to reduce perceived latency and improve real-time user feedback.

  • Strategic Model Selection: A decision-making framework for selecting between Haiku, Sonnet, and Opus based on specific task requirements, latency, and throughput costs.

  • Tool Use & Agent Loops: Designing secure function-calling interfaces where the LLM orchestrates operations while maintaining safe boundary control for database writes and sensitive actions.

  • Context Optimization: Implementing prompt caching for frequently accessed documents, large system prompts, and RAG context windows to optimize token spend and responsiveness.

  • Structured Data Extraction: Utilizing Zod for schema enforcement, ensuring LLM outputs are deterministic and safe for programmatic consumption.

  • Use for building production-grade AI features, document retrieval systems, or autonomous agent workflows.

  • Follow the core constraint: never trust LLM outputs directly for database mutations; always implement deterministic validation.

  • Input requirements include target document datasets and function schemas; output typically results in optimized API responses, retrieved context chunks, or tool execution plans.

  • Adhere to token budget management; always cache prompts exceeding 1024 tokens to maximize efficiency.

  • Refer to the provided documentation in references/ for specific implementation guides on SSE, RAG pipelines, and LLM-ops error handling.

Repository Stats

Stars
11
Forks
1
Open Issues
1
Language
Shell
Default Branch
main
Sync Status
Idle
Last Synced
May 4, 2026, 12:58 AM
View on GitHub