Ollama¶
Local LLM hosting platform for running models on your own hardware.
Configuration¶
PROVIDER=ollama
MODEL=llama3.2:3b
BASE_URL=http://localhost:11434
TEMPERATURE=0.7
ENABLE_FUNCTIONS=true
Key Parameters¶
Parameter | Description | Default |
---|---|---|
MODEL |
Model name (required) | - |
BASE_URL |
Ollama server URL | http://localhost:11434 |
TEMPERATURE |
Randomness (0.0-1.0) | 0.7 |
TOP_P |
Nucleus sampling | 1.0 |
TIMEOUT |
Request timeout (seconds) | 60 |
THINK |
Enable reasoning mode | false |
Advanced Options¶
Parameter | Description | Default |
---|---|---|
SEED |
Random seed | - |
NUM_PREDICT |
Max tokens to generate | - |
NUM_CTX |
Context window size | - |
NUM_BATCH |
Batch size | - |
NUM_GPU |
GPU layers | - |
NUM_THREAD |
CPU threads | - |
Features¶
- ✅ Streaming responses
- ✅ Function calling
- ✅ Local execution
- ✅ No API costs
- ✅ Privacy preservation
- ✅ Custom models
Installation¶
Install Ollama first:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download from https://ollama.ai/download
Pull models:
ollama pull llama3.2:3b
ollama pull qwen3:7b
Important Notes¶
- Requires Ollama server running locally
- Models must be pulled before use
- Performance depends on hardware specs
- Supports OpenAI-compatible tool calling
- Reasoning mode available with
THINK=true
in config - GPU acceleration recommended for larger models