Skip to content

YAICLI

Ollama

belingud/yaicli

Ollama¶

Local LLM hosting platform for running models on your own hardware.

Configuration¶

PROVIDER=ollama
MODEL=llama3.2:3b
BASE_URL=http://localhost:11434
TEMPERATURE=0.7
ENABLE_FUNCTIONS=true

Key Parameters¶

Parameter	Description	Default
`MODEL`	Model name (required)	-
`BASE_URL`	Ollama server URL	`http://localhost:11434`
`TEMPERATURE`	Randomness (0.0-1.0)	`0.7`
`TOP_P`	Nucleus sampling	`1.0`
`TIMEOUT`	Request timeout (seconds)	`60`
`THINK`	Enable reasoning mode	`false`

Advanced Options¶

Parameter	Description	Default
`SEED`	Random seed	-
`NUM_PREDICT`	Max tokens to generate	-
`NUM_CTX`	Context window size	-
`NUM_BATCH`	Batch size	-
`NUM_GPU`	GPU layers	-
`NUM_THREAD`	CPU threads	-

Features¶

✅ Streaming responses
✅ Function calling
✅ Local execution
✅ No API costs
✅ Privacy preservation
✅ Custom models

Installation¶

Install Ollama first:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows
# Download from https://ollama.ai/download

Pull models:

ollama pull llama3.2:3b
ollama pull qwen3:7b

Important Notes¶

Requires Ollama server running locally
Models must be pulled before use
Performance depends on hardware specs
Supports OpenAI-compatible tool calling
Reasoning mode available with THINK=true in config
GPU acceleration recommended for larger models