Ollama

Agentic Coding Workflow & Cline Demo

Square Peg hosted event on June 20, 2025 where I demonstrated a basic version of my daily Agentic Coding workflow using Cline and MCP tools. What does it take to write enterprise-grade code in the AI-native era? Join Square Peg investors James Tynan and Grace Dalla-Bona for a live demo and Q&A session with three leading AI-native developers - Grant Gurvis, Listiarso Wastuargo, and Sam McLeod - and get a behind-the-curtain look at the workflows that enable them to ship faster, smarter, and cleaner code using tools like Cursor, Cline, and smolagents. ...

Vibe Coding vs Agentic Coding

Picture this: A business leader overhears their engineering team discussing “vibe coding” and immediately imagines developers throwing prompts at ChatGPT until something works, shipping whatever emerges to production. The term alone-“vibe coding”-conjures images of seat-of-the-pants development that would make any CTO break out in a cold sweat. This misunderstanding is creating a real problem. Whilst vibe coding represents genuine creative exploration that has its place, the unfortunate terminology is causing some business leaders to conflate all AI-assisted / accelerated development with haphazard experimentation. I fear that engineers using sophisticated AI coding agents be it with advanced agentic coding tools like Cline to deliver production-quality solutions are finding their approaches questioned or dismissed entirely. ...

LLM Sampling Parameters Guide

Large Language Models don’t generate text deterministically - they use probabilistic sampling to select the next token based on prediction probabilities. How these probabilities are filtered and adjusted before sampling significantly impacts output quality. This guide explains the key sampling parameters, how they interact, and provides recommended settings for different use cases. Framework Reference Last updated: November 2025 Parameter Comparison Parameter llama.cpp Default Ollama Default MLX Temperature temp 0.8 temperature 0.8 temp Top P top_p 0.9 top_p 0.9 top_p Min P min_p 0.1 min_p 0.0 min_p Top K top_k 40 top_k 40 top_k Repeat Penalty repeat_penalty 1.0 repeat_penalty 1.1 Unsupported Repeat Last N repeat_last_n 64 repeat_last_n 64 Unsupported Presence Penalty presence_penalty 0.0 presence_penalty - Unsupported Frequency Penalty frequency_penalty 0.0 frequency_penalty - Unsupported Mirostat mirostat 0 mirostat 0 Unsupported Mirostat Tau mirostat_ent 5.0 mirostat_tau 5.0 Unsupported Mirostat Eta mirostat_lr 0.1 mirostat_eta 0.1 Unsupported Top N Sigma top_nsigma -1.0 Unsupported - Unsupported Typical P typical_p 1.0 typical_p 1.0 Unsupported XTC Probability xtc_probability 0.0 Unsupported - xtc_probability XTC Threshold xtc_threshold 0.1 Unsupported - xtc_threshold DRY Multiplier dry_multiplier 0.0 Unsupported - Unsupported DRY Base dry_base 1.75 Unsupported - Unsupported Dynamic Temp dynatemp_range 0.0 Unsupported - Unsupported Seed seed -1 seed 0 - Context Size ctx_size 2048 num_ctx 2048 - Max Tokens n_predict -1 num_predict -1 - Notable Default Differences Parameter llama.cpp Ollama Note min_p 0.1 0.0 Ollama disables Min P by default repeat_penalty 1.0 1.1 Ollama applies penalty by default seed -1 (random) 0 Different random behaviour Feature Support Feature llama.cpp Ollama MLX Core (temp, top_p, top_k, min_p) ✓ ✓ ✓ Repetition penalties ✓ ✓ ✗ Presence/frequency penalties ✓ ✓ ✗ Mirostat ✓ ✓ ✗ Advanced (DRY, XTC, typical, dynatemp) ✓ ✗ Partial Custom sampler ordering ✓ ✗ ✗ Core Sampling Parameters Temperature Controls the randomness of token selection by modifying the probability distribution before sampling. ...

Illustration of the components of a LLM's memory usage

Bringing K/V Context Quantisation to Ollama

Explaining the concept of K/V context cache quantisation, why it matters and the journey to integrate it into Ollama. Why K/V Context Cache Quantisation Matters The introduction of K/V context cache quantisation in Ollama is significant, offering users a range of benefits: • Run Larger Models: With reduced VRAM demands, users can now run larger, more powerful models on their existing hardware. • Expand Context Sizes: Larger context sizes allow LLMs to consider more information, leading to potentially more comprehensive and nuanced responses. For tasks like coding, where longer context windows are beneficial, K/V quantisation can be a game-changer. • Reduce Hardware Utilisation: Freeing up memory or allowing users to run LLMs closer to the limits of their hardware. Running the K/V context cache at Q8_0 quantisation effectively halves the VRAM required for the context compared to the default F16 with minimal quality impact on the generated outputs, while Q4_0 cuts it down to just one third (at the cost of some noticeable quality reduction). ...

Gollama: Ollama Model Manager

Gollama on Github Gollama is a client for Ollama for managing models. It provides a TUI for listing, filtering, sorting, selecting, inspecting (coming soon!) and deleting models and can link Ollama models to LM-Studio. The project started off as a rewrite of my llamalink project, but I decided to expand it to include more features and make it more user-friendly. ...

Confuddlement: Download Confluence Spaces as Markdown, Summarise with Ollama

Confuddlement on Github I was tired of manually downloading Confluence pages and converting them to Markdown, so I wrote a small command-line tool designed to simplify this process. Confuddlement is a Go-based tool that uses the Confluence REST API to fetch page content and convert it to Markdown files. It can fetch pages from multiple spaces, skip pages that have already been fetched, and summarise the content of fetched pages using the Ollama API. $ go run ./main.go Confuddlement 0.3.0 Spaces: [COOLTEAM, MANAGEMENT] Fetching content from space COOLTEAM COOLTEAM (Totally Cool Team Homepage) Retrospectives Decision log Development Onboarding Saved page COOLTEAM - Feature List to ./confluence_dump/COOLTEAM - Feature List.md Skipping page 7. Support, less than 300 characters MANAGEMENT (Department of Overhead and Bureaucracy) Painful Change Management Illogical Diagrams Saved page ./confluence_dump/Painful Change Management.md Saved page Illogical Diagrams to ./confluence_dump/Ilogical Diagrams.md Done! $ go run ./main.go summarise Select a file to summarise: 0: + COOLTEAM - Feature List 1: + Painful Change Management 2: + Illogical Diagrams Enter the number of the file to summarise: 1 Summarising Painful Change Management... "Change management in the enterprise is painful and slow. It involves many forms and approvals." go run main.go -q 'who is the CEO?' -s 'management' -r 2 Querying the LLM with the prompt 'who is the CEO?'... "The CEO of the company is Peewee Herman." Usage Running the Program Copy .env.template to .env and update the environment variables. Run the program using the command go run main.go or build the program using the command go build and run the resulting executable. The program will fetch Confluence pages and save them as Markdown files in the specified directory. Querying the documents with AI You can summarise the content of a fetched page using the Ollama API by running the program with the summarise argument: go run main.go summarise To perform a custom query, you can use the query argument: -q: The query to to provide to the LLM. -s: The search term to match documents against. -r: The number of lines before and after the search term to include in the context to the LLM. go run main.go -q 'who is the CEO?' -s 'management' -r 2 Querying the LLM with the prompt 'who is the CEO?'... "The CEO of the company is Peewee Herman." ...

LLM FAQ

“Should I run a larger parameter model, or a higher quality smaller model of the same family?” TLDR; Larger parameter model [lower quantisation quality] > Smaller parameter model [higher quantisation quality] E.g: Qwen2.5 32B Q3_K_M > Qwen2.5 14B Q8_0 Caveats: Don’t go lower than Q3_K_M, or IQ2_M, especially if the model is under 30B~ parameters. This is in the context of two models of the same family and version (e.g. Qwen2.5 Coder). Longer answer: Check out the Code Chaos and Copilots slide deck. ...