Skip to content

Groq

Groq's fast inference platform for various open-source models.

Configuration

PROVIDER=groq
API_KEY=gsk_your-groq-api-key
MODEL=llama-3.1-70b-versatile
TEMPERATURE=0.3
MAX_TOKENS=1024

Key Parameters

Parameter Description Default
API_KEY Groq API key (required) -
MODEL Model to use -
TEMPERATURE Randomness (0.0-1.0) 0.3
TOP_P Nucleus sampling 1.0
MAX_TOKENS Max response tokens 1024
TIMEOUT Request timeout (seconds) 60
BASE_URL Custom API endpoint https://api.groq.com/openai/v1
REASONING_EFFORT Reasoning effort level -
EXTRA_BODY Additional request parameters {}

Features

  • ✅ Streaming responses
  • ✅ Function calling
  • ✅ MCP support
  • ✅ OpenAI-compatible API
  • ✅ Ultra-fast inference
  • ✅ Reasoning capabilities (qwen3 models)

Important Notes

  • Uses OpenAI-compatible client implementation
  • N parameter is automatically set to 1 (Groq limitation)
  • Reasoning effort only supported for qwen3 models
  • Valid reasoning_effort values: null, default
  • Optimized for speed with high tokens per second
  • Multiple open-source models available