Performance

Patching NVIDIA's driver and vLLM to enable P2P on consumer GPUs

NVIDIA artificially restricts peer-to-peer (P2P) GPU communication to their enterprise cards. Turns out this is a software limitation, not a hardware one. I patched my drivers to remove it, hacked vLLM to take advantage of it, and got a 15-50% throughput improvement running Qwen 3.5 35b on dual RTX 3090s.

Fixing AMD CPU Scaling on Fedora

Recently, after replacing my home server I noticed that the CPU (Ryzen 7600) was only scaling between 3000MHz and 3800MHz, which is the base and the first level boost clock of the CPU. I was expecting it to scale down to as low as 400Mhz when idle, and up to 5.17Ghz on boost. ...