About Internal Covariate Shift)
https://arxiv.org/abs/1805.11604
Fun paper suggesting that the explanation we were taught in deep learning grade school for using BatchNorm ("internal covariate shift") was all a lie. Instead, the authors suggest that BatchNorm smooths the loss function, but this probably isn't the whole story either.
интересно, но практически бесполезно
Обсуждают сегодня