各种Normalization

1 BatchNorm、InstanceNorm和LayerNorm的理解

[1] Batch Normalization, Instance Normalization, Layer Normalization: Structural Nuances
• Transformer的Encoder使用了Layer Normalization
• 还有个Group Normalization,可以参考《全面解读Group Normalization》

2 BatchNorm

2.1 momentum参数在计算running mean和running variance中起到importance factor的作用

[2] https://stats.stackexchange.com/questions/219808/how-and-why-does-batch-normalization-use-moving-averages-to-track-the-accuracy-o
[3] Batch Normlization Explained

running_mean = momentum * running_mean + (1-momentum) * new_mean 
running_var = momentum* running_var + (1-momentum) * new_var

Momentum is the importance given to the last seen mini-batch, a.k.a “lag”. If the momentum is set to 0, the running mean and variance come from the last seen mini-batch. However, this may be biased and not the desirable one for testing. Conversely, if momentum is set to 1, it uses the running mean and variance from the first mini-batch. Essentially, momentum controls how much each new mini-batch contributes to the running averages.
Ideally, the momentum should be set close to 1 (>0.9) to ensure slow learning of the running mean and variance such that the noise in a mini-batch is ignored.

2.2 torch.utils.checkpoint对batch normalization的处理

[4] Trading compute for memory in PyTorch models using Checkpointing

Batch normalization layer maintains the running mean and variance stats depending on the current minibatch and everytime a forward pass is run, the stats are updated based on the momentum value. In checkpointing, running the forward pass twice on a model segment in the same iteration will result in updating mean and stats value. In order to avoid this, use the new_momentum = sqrt(momentum) as the momentum value.

3 AdaIN(Adaptive Instance Normalization)

AdaIN是style transfer中经常用到的一种normalization

AdaIN receives a content input x and a style input y, and simply aligns the channel- wise mean and variance of x to match those of y. Unlike BN, IN or CIN, AdaIN has no learnable affine parameters.

\[\operatorname{AdaIN}(x, y)=\sigma(y)\left(\frac{x-\mu(x)}{\sigma(x)}\right)+\mu(y) \]

IBN-Net对Instance Normalization和Batch Normalization的一个推论

IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information

IBN-Net在ReID模型中用得比较多。

上一篇:Super U Plan for Mac(项目进度管理软件)


下一篇:K8s学习笔记-001-Kubeadm安装k8s集群(5台机器CentOS7-3台master,2台node)