If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! calendly.com/oscar-savolainen
I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc.
Video Summary:
In this video, we go over the theory of how to statically quantize a PyTorch model in Eager mode.
Timestamps:
00:00 Intro
03:05 Required Architecture Changes (QuantStubs/ DeQuantStubs/ FloatFunctionals)
08:54 Fusing modules
12:18 Assignment of QConfigs (recipe for quantization for each module),
15:26 Preparing the model for quantization (i.e. making the model fake-quantizable),
20:25 Converting the model to a "true" quantized int8 model.
23:06 Conclusion
For more background on what it means to quantize a tensor, see: • Understanding int8 neural network quantiza...
Links (PyTorch documentation):
Quant/DeQuant stub definition: pytorch.org/docs/stable/_modules/torch/ao/quantiza…
FloatFunctionals definition: pytorch.org/docs/stable/generated/torch.ao.nn.quan…
QConfig: pytorch.org/docs/stable/generated/torch.ao.quantiz…
`prepare_qat`: pytorch.org/docs/stable/generated/torch.ao.quantiz…
-Converting the model: pytorch.org/docs/stable/generated/torch.ao.quantiz…
コメント