LLM Sampling Methods Comparison

Comprehensive Guide to LLM Sampling Parameters

Large Language Models (LLMs) like those used in Ollama don’t generate text deterministically - they use probabilistic sampling to select the next token based on the model’s prediction probabilities. How these probabilities are filtered and adjusted before sampling significantly impacts the quality of generated text. This guide explains the key sampling parameters and how they affect your model’s outputs, along with recommended settings for different use cases. Ollama Sampling Diagram Sampling Methods Comparison Example Ollama Sampling Settings Table Setting General Coding Coding Alt Factual/Precise Creative Writing Creative Chat min_p 0.05 0.05 0.9 0.1 0.05 0.05 temperature 0.7 0.2 0.2 0.3 1.0 0.85 top_p 0.9 0.9 1.0 0.8 0.95 0.95 mirostat 0 0 0 0 0 0 repeat_penalty 1.1 1.05 1.05 1.05 1.0 1.15 top_k 40 40 0* 0* 0 0 *For factual/precise use cases Some guides recommend Top K = 40, but Min P generally provides better adaptive filtering. Consider using Min P alone with a higher value (0.1) for most factual use cases. ...

April 25, 2025 · 19 min · 3960 words · Sam McLeod

LLM Parameter Playground

Here’s a fun little tool I’ve been hacking on to explore the effects of different inference parameters on LLMs. You can find the code and instructions for running it locally on GitHub. It started as a fork of rooben-me’s tone-changer-open, which itself was a “fork” of Figma’s tone generator, I’ve made quite a few changes to make it more focused on local LLMs and advanced parameter exploration.

July 20, 2024 · Sam McLeod