llmfit-advisor
Analyze local system hardware (RAM, CPU, GPU/VRAM) to receive expert recommendations for optimized local LLM models, quantization settings, and performance estimates.
Introduction
llmfit-advisor is an intelligent hardware-aware agent designed to bridge the gap between complex local model requirements and user system specifications. It provides an automated interface to the llmfit engine, allowing users to determine exactly which Large Language Models (LLMs) can run effectively on their machine based on real-time diagnostics of CPU, RAM, and GPU memory (VRAM). This tool is essential for developers, AI enthusiasts, and researchers who want to run models like Llama, Mistral, Gemma, or Qwen locally without the trial-and-error process of guessing fit levels. By evaluating hardware against model parameter counts and architecture (including Mixture of Experts), it ensures efficient resource utilization and optimal inference speed.
-
Automatically detects system hardware including NVIDIA, AMD, and Apple Silicon unified memory.
-
Provides intelligent scoring for models based on composite metrics: quality, speed, fit level, and context window size.
-
Recommends optimal quantization (e.g., Q4_K_M, Q5_K_M) to maximize model capability within available memory constraints.
-
Integrates directly with local inference providers such as Ollama, vLLM, and LM Studio for seamless setup.
-
Includes support for multi-GPU setups and hardware-simulated planning.
-
Offers filterable recommendations by use-case, including coding, reasoning, chat, multimodal, and embedding tasks.
-
The advisor uses commands like llmfit --json system to assess hardware and llmfit recommend to retrieve prioritized model lists.
-
Provides clear 'fit levels' (Perfect, Good, Marginal, TooTight) to prevent memory-related crashes during inference.
-
Facilitates configuration of models.providers.ollama and other backend environments by mapping HuggingFace repository names to local provider tags.
-
Ideal for users looking to maximize tokens-per-second (TPS) efficiency through informed quantization and hardware offloading choices (GPU, CPU+GPU Offload, or CPU).
-
Supports interactive TUI workflows for real-time adjustments and advanced hardware performance benchmarks via community data.
Repository Stats
- Stars
- 25,116
- Forks
- 1,493
- Open Issues
- 56
- Language
- Rust
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 02:23 AM