🤖 Ollama Setup — Free Local AI¶
Ollama is the recommended way to run AgentOS. It lets you run high-fidelity Large Language Models entirely on your machine with maximum privacy and zero recurring costs.
🏗️ Installation¶
macOS¶
Download the macOS app or install via Homebrew:
brew install ollama
Linux¶
One-line installer:
curl -fsSL https://ollama.com/install.sh | sh
Windows¶
Download the .exe installer from the official site and follow the prompts.
🧠 Choice of Models¶
After installing, you need to "pull" a model. We've optimized Jean-Pierre for the Llama 3 family.
# Recommended for most users
ollama pull llama3.2
# For powerful workstations (M3 Max / RTX 4090)
ollama pull llama3.2:70b
# For ultra-lightweight performance
ollama pull llama3.2:1b
Hardware Recommendations¶
| Model | RAM | Quality | Best For |
|---|---|---|---|
| Llama 3.2 (3b) | 8GB | ⭐⭐⭐⭐ | Universal standard |
| Mistral (7b) | 16GB | ⭐⭐⭐⭐ | Alternative logic |
| Llama 3.1 (70b) | 48GB | ⭐⭐⭐⭐⭐ | Executive reasoning |
⚡ Connecting to AgentOS¶
Once Ollama is running (it stays in your system tray on macOS/Windows), simply tell AgentOS which flavor to use.
Web UI¶
Open Settings (++cmd+comma++), select Ollama as the provider, and pick your model from the dropdown.
CLI¶
Run agentos serve. The engine will auto-detect the local Ollama instance at http://localhost:11434.
🛠️ Performance Tuning¶
If the agent feels sluggish, ensure you have: 1. Model Matching: Don't run a 70B model on a MacBook Air. Stick to 3B or 7B. 2. GPU Acceleration: Ollama automatically uses WebGPU/Metal on macOS. On Linux/Windows, ensure your NVIDIA/AMD drivers are up to date. 3. Dedicated Resources: Close high-memory browser tabs or development servers if you're pushing larger models.