llmintegration
Multi-model LLM integration patterns for Claude, GPT, Gemini, and Ollama. Features API handling, prompt engineering, token management, and model-agnostic orchestration.
Introduction
This skill provides a comprehensive framework for integrating Large Language Models (LLMs) into the Golden Armada AI Agent Fleet Platform. It is designed for software engineers and AI developers who need to implement production-grade LLM connectivity, abstracting the complexities of different provider APIs into a unified interface. The system supports major commercial models including Anthropic Claude, OpenAI GPT, and Google Gemini, alongside local model execution via Ollama, enabling flexible infrastructure strategies that span cloud-based reasoning and private, local deployment.
-
Unified Provider Abstraction: Implements an abstract base class (LLMProvider) and factory pattern to standardize generate and stream methods across different LLM backends.
-
Multi-Model Orchestration: Facilitates the swapping of models for varying task requirements, such as using high-reasoning models for complex logic or cost-effective models for simple text processing.
-
Prompt Engineering Toolkit: Includes modular structures for system prompts, few-shot learning, and chain-of-thought sequences to improve agent output quality.
-
Native API Integration: Provides pre-configured client patterns for Anthropic, OpenAI, Google Generative AI, and Ollama Python SDKs.
-
Token Management: Offers patterns for tracking max_tokens, managing context windows, and optimizing input/output limits during streaming and batch operations.
-
Input: User prompts, system instructions, and provider-specific configurations (API keys, model version tags).
-
Output: Generated text streams, structured tool use/function calling payloads, and formatted completion responses.
-
Usage Notes: Always utilize environment variables for sensitive credentials like API keys. Ensure that local Ollama models are pulled and verified before execution in the agent environment. The abstraction layer is designed to be extensible—new providers can be added by implementing the LLMProvider interface and updating the LLMFactory.
-
Constraints: Reliability is dependent on the uptime of third-party APIs. For local Ollama deployment, performance is gated by available hardware (GPU/VRAM). Ensure appropriate error handling for network timeouts or rate-limiting responses from commercial LLM providers.
Repository Stats
- Stars
- 1
- Forks
- 0
- Open Issues
- 0
- Language
- HTML
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 4, 2026, 12:34 AM