Quantization Bit Manipulation Visuals
Interactive guide to internalize the math behind Q5_0, Q4_K, and INT8 kernels
Goal: Make Bit Manipulation Visceral
These visuals explain why the code does qh >> (j + 12) instead of qh >> (j + 16).
Step through each diagram to understand exactly how bits are packed and extracted in quantized formats.
100%
Quantization Overview
- Where quantization happens in a transformer layer
- Bits per weight: FP32 (32) → Q8_0 (8.5) → Q5_0 (5.5) → Q4_K (4.5)
- Kernel selection: Which kernel to use for each weight type
- INT8 vs FP32: Why quantized activations are 4x faster
Keyboard shortcuts:
1-4 switch diagrams |
+/- zoom |
0 reset |
F fullscreen