Llm | smcleod.net

Agentic Coding Workflow & Cline Demo

Square Peg hosted event on June 20, 2025 where I demonstrated a basic version of my daily Agentic Coding workflow using Cline and MCP tools. What does it take to write enterprise-grade code in the AI-native era? Join Square Peg investors James Tynan and Grace Dalla-Bona for a live demo and Q&A session with three leading AI-native developers - Grant Gurvis, Listiarso Wastuargo, and Sam McLeod - and get a behind-the-curtain look at the workflows that enable them to ship faster, smarter, and cleaner code using tools like Cursor, Cline, and smolagents. ...

Vibe Coding vs Agentic Coding

Picture this: A business leader overhears their engineering team discussing “vibe coding” and immediately imagines developers throwing prompts at ChatGPT until something works, shipping whatever emerges to production. The term alone—“vibe coding”—conjures images of seat-of-the-pants development that would make any CTO break out in a cold sweat. This misunderstanding is creating a real problem. Whilst vibe coding represents genuine creative exploration that has its place, the unfortunate terminology is causing some business leaders to conflate all AI-assisted / accelerated development with haphazard experimentation. I fear that engineers using sophisticated AI coding agents be it with advanced agentic coding tools like Cline to deliver production-quality solutions are finding their approaches questioned or dismissed entirely. ...

My Plan, Document, Act, Review flow for Agentic Software Development

The following my workflow for agentic coding. The basic flow is Setup -> Plan -> Act -> Review and Iterate. Setup - Ensure the right rules and tools are enabled, optionally gather important documentation or examples. Plan - Build a detailed plan based off your goals, requirements and ideation with the coding agent. Act - Perform the development tasks, in phases. Review and Iterate - Review the work, update the plan and iterate as required. 🕵 Setup Ensure any directories or files you don’t want Cline to read are excluded by adding them to a .clineignore file in the root of your project. 🛠️ Tools The effective use of tools is critical to the success and cost effectiveness of agentic coding. The MCP Servers (tools) I frequently use are available here: sammcj/agentic-coding#mcp-servers ...

Comprehensive Guide to LLM Sampling Parameters

Large Language Models (LLMs) like those used in Ollama don’t generate text deterministically - they use probabilistic sampling to select the next token based on the model’s prediction probabilities. How these probabilities are filtered and adjusted before sampling significantly impacts the quality of generated text. This guide explains the key sampling parameters and how they affect your model’s outputs, along with recommended settings for different use cases. Ollama Sampling Diagram ...

Getting Started with Agentic Systems - Developer Learning Paths

As agentic systems become increasingly central to modern software development, many engineers are looking to build practical skills but don’t know where to start. This guide provides a short list of pre-reading/watching and hands-on training resources to help you get started with developing with AI agents. The focus is on practical implementation for tools and methods you’re likely to use in the workplace, so you can quickly gain experience and confidence in building AI powered and agentic systems. ...

The Cost of Agentic Coding

Don’t ask yourself “What if my high performing engineers spent $2k/month on agentic coding?” …ask yourself why they (and others) aren’t - and what opportunities they’re missing as a result. ...

The Democratisation Paradox: What History Teaches Us About AI

Every technological revolution has triggered waves of anxiety about the obsolescence of human skills and professions. The current fears that AI will replace artists, eliminate writing jobs, render illustrators obsolete, and devalue creative work follow a well-established historical pattern that’s worth examining critically. The Democratisation Paradox When photography emerged in the 19th century, painters predicted the death of portraiture. When home cameras became accessible, professional photographers feared obsolescence. When smartphones put cameras in everyone’s pockets, the same concerns resurfaced 1 2 . Yet professional photography hasn’t vanished—it’s evolved. What actually occurred was a democratisation of image creation, while simultaneously elevating the appreciation for truly skilled work 3 . ...

The effects of prompt caching on Agentic coding

Prompt caching is a feature that Anthropic first offered on their API in 2024. It adds a cache for the tokens used Why it matters Without prompt caching every token in and out of the API must be processed and paid for in full. This is bad for your wallet, bad for the LLM hosting providers bottom line and bad for the environment. This is especially important when it comes to Agentic coding, where there are a lot of tokens in/out and important - a lot of token reuse, which makes it a perfect use case for prompt caching. ...

Agentic Coding - Live Demo / Brownbag

Apologies for the video quality, Google Meet/Hangouts records in very low resolution and bitrate. Links mentioned in the video: Cline Roo Code (Cline fork with some experimental features) MCP https://modelcontextprotocol.io/introduction The package-version MCP server I created: https://github.com/sammcj/mcp-package-version https://smithery.ai (index of MCP servers) https://mcp.so(index of MCP servers) https://glama.ai/mcp/servers (index of MCP servers)

Bringing K/V Context Quantisation to Ollama

Explaining the concept of K/V context cache quantisation, why it matters and the journey to integrate it into Ollama. Why K/V Context Cache Quantisation Matters The introduction of K/V context cache quantisation in Ollama is significant, offering users a range of benefits: • Run Larger Models: With reduced VRAM demands, users can now run larger, more powerful models on their existing hardware. • Expand Context Sizes: Larger context sizes allow LLMs to consider more information, leading to potentially more comprehensive and nuanced responses. For tasks like coding, where longer context windows are beneficial, K/V quantisation can be a game-changer. • Reduce Hardware Utilisation: Freeing up memory or allowing users to run LLMs closer to the limits of their hardware. Running the K/V context cache at Q8_0 quantisation effectively halves the VRAM required for the context compared to the default F16 with minimal quality impact on the generated outputs, while Q4_0 cuts it down to just one third (at the cost of some noticeable quality reduction). ...