An Improved ND-Adam Algorithm for Neural Networks to Reduce the Learning Error Rate
Keywords:
Deep Neural Networks, Gradient Descent, Stochastic Gradient Descent, Optimizer, global minima, local minima, Adam and N-Adam, CIFAR 10, CIFAR 100, Confusion Matrix.Abstract
RMSprop and Adam are two adaptive optimization approaches that have been shown to achieve better performance in several scenarios resulting from stochastic gradient descent (SGD). In a more recent study, summaries performed worse when comparing SGDs, which lie mostly in DNN (deep neural network) training. This paper identifies cases where Adam’s generalization is worse than SGD’s. And Adam’s version is inevitable to remove the simplification of the gap. NDAdam (Normalized direction preserving Adam) is proposed, which implements step size when updating weight vector and controls direction with high precision and advantages in improving generalization performance. The same logic is followed to increase the efficiency of summarizing classification tasks via softmax logits. We were able to develop gap in Adam and SGD and have the opportunity to explain need of the certain optimization approaches perform better than some existing algorithms.
References
- S. Haykin, Neural Networks a Comprehensive Foundation. Prentice Hall, New Jersey, 1999
- Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press.
- J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol. 12,no. Jul, pp. 2121-2159, 2011.
- Kingma, Diederik; Ba, Jimmy (2014). "Adam: A Method for Stochastic Optimization". arXiv:1412.6980
- M. D. Zeiler, Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701, 2012.
- T. Tieleman and G. Hinton, Lecture 6.5RmsProp: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012.
- J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol. 12,no. Jul, pp. 2121-2159, 2011.
- Van Wieringen, Wessel (2021-05-31). "Lecture notes on ridge regression". arXiv:1509.09169 [stat.ME].
- B. Choi, J. H. Lee and D. H. Kim, Solving local minima problem with large number of hidden nodes on two-layered feed-forward artificial neural networks, NeuroComputing, vol.71, no.16-18, pp.3640- 3643, 2008.
- Richard Breen, Kristian Bernt Karlson, and Anders Holm, "Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models" , Annual Review of Sociology Volume 44, 2018
- Gupta, Maya R.; Bengio, Samy; Weston, Jason (2014). "Training highly multiclass classifiers" , JMLR. 15 (1): 1461–1492.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.