Inference

Large Language Models (LLMs) like those used in Ollama don’t generate text deterministically - they use probabilistic sampling to select the next token based on the model’s prediction probabilities. How these probabilities are filtered and adjusted before sampling significantly impacts the quality of generated text. This guide explains the key sampling parameters and how they affect your model’s outputs, along with recommended settings for different use cases. Ollama Sampling Diagram Sampling Methods Comparison Example Ollama Sampling Settings Table Setting General Coding Coding Alt Factual/Precise Creative Writing Creative Chat min_p 0.05 0.05 0.9 0.1 0.05 0.05 temperature 0.7 0.2 0.2 0.3 1.0 0.85 top_p 0.9 0.9 1.0 0.8 0.95 0.95 mirostat 0 0 0 0 0 0 repeat_penalty 1.1 1.05 1.05 1.05 1.0 1.15 top_k 40 40 0* 0* 0 0 *For factual/precise use cases Some guides recommend Top K = 40, but Min P generally provides better adaptive filtering. Consider using Min P alone with a higher value (0.1) for most factual use cases. ...

Inference

Comprehensive Guide to LLM Sampling Parameters

LLM Parameter Playground