Source: developer.qualcomm

Overview

Non-quantized DLC files use 32 bit floating point representations of network parameters.
Quantized DLC files use fixed point representations of network parameters, generally 8 bit weights and 8 or 32bit biases. The fixed point representation is the same used in Tensorflow quantized models.

Quantization Algorithm

Quantization converts floating point data to Tensorflow-style 8-bit fixed point format
The following requirements are satisfied:
- Full range of input values is covered.
- Minimum range of 0.01 is enforced.
- Floating point zero is exactly representable.
Quantization algorithm inputs:
- Set of floating point values to be quantized.
Quantization algorithm outputs:
- Set of 8-bit fixed point values.
- Encoding parameters:
  - encoding-min - minimum floating point value representable (by fixed point value 0)
  - encoding-max - maximum floating point value representable (by fixed point value 255)
Algorithm

Compute the true range (min, max) of input data.
Compute the encoding-min and encoding-max.
Quantize the input floating point values.
Output:

fixed point values
encoding-min and encoding-max parameters

Details

Compute the true range of the input floating point data.
- finds the smallest and largest values in the input data
- represents the true range of the input data
Compute the encoding-min and encoding-max.
- These parameters are used in the quantization step.
- These parameters define the range and floating point values that will be representable by the fixed point format.
  - encoding-min: specifies the smallest floating point value that will be represented by the fixed point value of 0
  - encoding-max: specifies the largest floating point value that will be represented by the fixed point value of 255
  - floating point values at every step size, where step size = (encoding-max - encoding-min) / 255, will be representable
1. encoding-min and encoding-max are first set to the true min and true max computed in the previous step
2. First requirement: encoding range must be at least a minimum of 0.01
  - encoding-max is adjusted to max(true max, true min + 0.01)
3. Second requirement: floating point value of 0 must be exactly representable
  - encoding-min or encoding-max may be further adjusted
Handling 0.
1. Case 1: Inputs are strictly positive
  - the encoding-min is set to 0.0
  - zero floating point value is exactly representable by smallest fixed point value 0
  - e.g. input range = [5.0, 10.0]
    - encoding-min = 0.0, encoding-max = 10.0
2. Case 2: Inputs are strictly negative
  - encoding-max is set to 0.0
  - zero floating point value is exactly representable by the largest fixed point value 255
  - e.g. input range = [-20.0, -6.0]
    - encoding-min = -20.0, encoding-max = 0.0
3. Case 3: Inputs are both negative and positive
  - encoding-min and encoding-max are slightly shifted to make the floating point zero exactly representable
  - e.g. input range = [-5.1, 5.1]
    - encoding-min and encoding-max are first set to -5.1 and 5.1, respectively
    - encoding range is 10.2 and the step size is 10.2/255 = 0.04
    - zero value is currently not representable. The closest values representable are -0.02 and +0.02 by fixed point values 127 and 128, respectively
    - encoding-min and encoding-max are shifted by -0.02. The new encoding-min is -5.12 and the new encoding-max is 5.08
    - floating point zero is now exactly representable by the fixed point value of 128
Quantize the input floating point values.
- encoding-min and encoding-max parameter determined in the previous step are used to quantize all the input floating values to their fixed point representation
- Quantization formula is:
  - quantized value = round(255 * (floating point value - encoding.min) / (encoding.max - encoding.min))
- quantized value is also clamped to be within 0 and 255
Outputs

the fixed point values
encoding-min and encoding-max parameters

Quantization Example

Inputs:
- input values = [-1.8, -1.0, 0, 0.5]
encoding-min is set to -1.8 and encoding-max to 0.5
encoding range is 2.3, which is larger than the required 0.01
encoding-min is adjusted to −1.803922 and encoding-max to 0.496078 to make zero exactly representable
step size (delta or scale) is 0.009020
Outputs:
- quantized values are [0, 89, 200, 255]

Dequantization Example

Inputs:
- quantized values = [0, 89, 200, 255]
- encoding-min = −1.803922, encoding-max = 0.496078
step size is 0.009020
Outputs:
- dequantized values = [−1.8039, −1.0011, 0.0000, 0.4961]

Tech It Yourself

SNPE Quantization Algorithm

Overview

Quantization Algorithm

Details

Quantization Example

Dequantization Example

Post a Comment

0 Comments

Latest Posts

Popular Posts

Collection of points in Computer Vision

Note 2: Linear regression - Python demos

Principal Component Analysis PCA using Singular Value Decomposition SVD

new data augmentation methods

Visualize the heatmap - GradCAM - Keras

Popular Posts

Collection of points in Computer Vision

Note 2: Linear regression - Python demos

Principal Component Analysis PCA using Singular Value Decomposition SVD

new data augmentation methods

Visualize the heatmap - GradCAM - Keras

Robot Operating System - ROS tutorial

UML Class Diagram Relationships, Aggregation, Composition

What is a batch-norm in machine learning?

Tags

SNPE Quantization Algorithm

Overview

Quantization Algorithm

Details

Quantization Example

Dequantization Example

Post a Comment

0 Comments

Follow us

Latest Posts

Popular Posts

Popular Posts

Tags