Most AI models are trained in 16-bit precision. But for inference, you rarely need that much detail. Quantization is the art of rounding down numbers to save space.

Imagine rounding 3.14159 to just 3. You lose some accuracy, but it's much easier to store. Quantization does this for the billions of weights in a neural network, often reducing memory usage by 70% with negligible performance loss.