An Improved ND-Adam Algorithm for Neural Networks to Reduce the Learning Error Rate

Katha Kishor Kumar; Suresh Pabboju

doi:10.32628/CSEIT2215471

Authors

Katha Kishor Kumar Associate Professor of CSE,Kakatiya University, Warangal, Telangana, India
Suresh Pabboju Professor of IT, CBIT, Gandipet, Hyderabad, Telangana, India

Keywords:

Deep Neural Networks, Gradient Descent, Stochastic Gradient Descent, Optimizer, global minima, local minima, Adam and N-Adam, CIFAR 10, CIFAR 100, Confusion Matrix.

Abstract

RMSprop and Adam are two adaptive optimization approaches that have been shown to achieve better performance in several scenarios resulting from stochastic gradient descent (SGD). In a more recent study, summaries performed worse when comparing SGDs, which lie mostly in DNN (deep neural network) training. This paper identifies cases where Adam’s generalization is worse than SGD’s. And Adam’s version is inevitable to remove the simplification of the gap. NDAdam (Normalized direction preserving Adam) is proposed, which implements step size when updating weight vector and controls direction with high precision and advantages in improving generalization performance. The same logic is followed to increase the efficiency of summarizing classification tasks via softmax logits. We were able to develop gap in Adam and SGD and have the opportunity to explain need of the certain optimization approaches perform better than some existing algorithms.

References

S. Haykin, Neural Networks a Comprehensive Foundation. Prentice Hall, New Jersey, 1999
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press.
J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol. 12,no. Jul, pp. 2121-2159, 2011.
Kingma, Diederik; Ba, Jimmy (2014). "Adam: A Method for Stochastic Optimization". arXiv:1412.6980
M. D. Zeiler, Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701, 2012.
T. Tieleman and G. Hinton, Lecture 6.5RmsProp: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012.
J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol. 12,no. Jul, pp. 2121-2159, 2011.
Van Wieringen, Wessel (2021-05-31). "Lecture notes on ridge regression". arXiv:1509.09169 [stat.ME].
B. Choi, J. H. Lee and D. H. Kim, Solving local minima problem with large number of hidden nodes on two-layered feed-forward artificial neural networks, NeuroComputing, vol.71, no.16-18, pp.3640- 3643, 2008.
Richard Breen, Kristian Bernt Karlson, and Anders Holm, "Interpreting and Understanding Logits, Probits, and Other Nonlinear Probability Models" , Annual Review of Sociology Volume 44, 2018
Gupta, Maya R.; Bengio, Samy; Weston, Jason (2014). "Training highly multiclass classifiers" , JMLR. 15 (1): 1461–1492.

An Improved ND-Adam Algorithm for Neural Networks to Reduce the Learning Error Rate

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite