AI models (“LLMs” in this case) have inherently large sizes and computational requirements that often pose challenges for deployment and use.
Disclaimer
I am not a ML or data scientist. I am simply an engineer with an interest in AI. This project is a result of my personal interest in understanding the impact of quantisation on LLMs. The visualisations are based on my understanding of the subject and may not be 100% accurate or complete. I encourage you to verify the information presented here with other sources.
Quantisation
Quantisation, a technique to reduce model size and memory footprint, is often confusing for newcomers, and understanding the trade-offs involved in the various quantisation types can be complex.
Quantisation refers to the process of converting model weights from higher to lower precision data types (e.g. floating point -> integer)
As a thought experiment, and for my own learning, I’ve created an interactive dashboard to help myself and other users understand the impact of quantisation on LLMs.
This (somewhat) interactive dashboard aims to demystify LLM quantisation by providing visual representations of key metrics and trade-offs.
Colour Spectrum Analogy
Imagine the model data to be the colour spectrum (pictured as 16 bits here)
If we quantise the data to 8 bits we are removing (thus compressing) parts of the data based on a set of rules. We can still see a wide range of “colours” but we lose some of the detail.
Note: This is a crude analogy, Modern quantisation techniques have smarts that selectively quantise parts of the model to varying degrees to reduce the loss.
Dashboard
The data is mainly focused on GGUF quantisation, however the visualisations can be used to understand other quantisation and model formats as well. I plan to add more quantisation techniques and models in the future.
If you find errors - please do let me know! I want to correct my understanding and improve the visualisations over time. The dashboard is open source and available at: https://github.com/sammcj/quant/ and I welcome contributions and feedback.
Note: This chart requires JavaScript and may not render properly on all mobile devices with small screens.