This is a simple method for weight initialization for deep net learning. The method consists of the two steps:
- First, pre-initialize weights of each convolution or inner-product layer with
orthonormal matrices.
- Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one.
Experiment with different activation functions (maxout, ReLU-family, tanh) show
that the proposed initialization leads to learning of very deep nets.
Pseudo-code of LSUV
Note:
- In the most cases, batch normalization put after non-linearity performs better.
- LSUV-initialized network is as good as batch-normalized one.
- The paper are not claiming that batch normalization can always be replaced by proper initialization, especially in large datasets like ImageNet.
LSUV-keras: https://github.com/ducha-aiki/LSUV-keras
- First, pre-initialize weights of each convolution or inner-product layer with
orthonormal matrices.
- Second, proceed from the first to the final layer, normalizing the variance of the output of each layer to be equal to one.
Experiment with different activation functions (maxout, ReLU-family, tanh) show
that the proposed initialization leads to learning of very deep nets.
Pseudo-code of LSUV
Note:
- In the most cases, batch normalization put after non-linearity performs better.
- LSUV-initialized network is as good as batch-normalized one.
- The paper are not claiming that batch normalization can always be replaced by proper initialization, especially in large datasets like ImageNet.
LSUV-keras: https://github.com/ducha-aiki/LSUV-keras
0 Comments