Adaptive Weight Decay for Deep Neural Networks
Regularization in the optimization of deep neural networks is often critical to avoid undesirable Walking Stick over-fitting leading to better generalization of model.One of the most popular regularization algorithms is to impose L2 penalty on the model parameters resulting in the decay of parameters, called weight-decay, and the decay rate is gene