Deep Learning for Image Classification: Methods, Challenges, and Future Directions

Authors

  • Sanjay Kumar Gorai Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, Jharkhand, India Author
  • Anurag Sarangi Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, Jharkhand, India Author
  • Shekhar Pradhan Post Graduate, Department of Computer Science, Kolhan University, Westsingbhum, Chaibasa-833201, Jharkhand, India Author

DOI:

https://doi.org/10.32628/CSEIT2511110

Keywords:

Deep Learning, Image Classification, Convolutional Neural Networks (CNNs), Transfer Learning, Vision Transformers (ViTs), Data Augmentation, Interpretability, Few-Shot Learning, Self-Supervised Learning, Multimodal Learning, Federated Learning

Abstract

Image classification has been fundamentally changed by deep learning that has driven unprecedented accuracy and has empowered applications ranging from healthcare to autonomous cars to security. For example, medical imaging has been diagnosed for diseases such as diabetic retinopathy and tumour detection using deep learning models to an excellent degree. Object classification algorithms in autonomous vehicles are responsible for enabling real time navigation and obstacle avoidance. More recently, the advances in image classification have been made possible with recent breakthroughs including Vision Transformers (ViTs) and self-supervised learning models like SimCLR. In this paper, we explore the main methods on which the deep learning-based image classification fundamentally lies, including the convolutional neural networks (CNNs), transfer learning, and attention mechanisms. Finally, it also discusses the field challenges, like the need to large labelled datasets, computational requirements, and interpretability and it provides solutions to overcome them. We conclude with promising future directions including few shots learning, unsupervised learning and the combination of multimodal data and how they will further advance and open up new applications.

Downloads

Download data is not yet available.

References

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks." Advances in Neural Information Processing Systems. 2012, (1-10).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep Residual Learning for Image Recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, (770–778).

Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, & Neil Houlsby. (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations.2021, (1-12).

Simonyan, K., & Zisserman, A. (2015). "Very Deep Convolutional Networks for Large-Scale Image Recognition." International Conference on Learning Representations.2021, (1-8).

Radford, A., Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, & Ilya Sutskever. (2021). "Learning Transferable Visual Models From Natural Language Supervision." International Conference on Machine Learning. 2021, (1–15).

Goodfellow, I., Jonathon Shlens, & Christian Szegedy. (2014). "Explaining and Harnessing Adversarial Examples." arXiv preprint arXiv:1412.6572. 2014, (1–10).

Kingma, D. P., & Welling, M. (2013). "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114. 2013, (1–8).

Tan, M., & Le, Q. (2019) "Efficient Net: Rethinking Model Scaling for Convolutional Neural Networks." Proceedings of the International Conference on Machine Learning. 2019, (6105–6114).

Chen, T., Kornblith, S. Norouzi, M., & Hinton, G. (2020). "A Simple Framework for Contrastive Learning of Visual Representations." arXiv preprint arXiv:2002.05709. 2020, (1–12).

Esteva, A., Brett Kuprel , Roberto A Novoa , Justin Ko , Susan M Swetter , Helen M Blau , & Sebastian Thrun. (2017). "Dermatologist-level classification of skin cancer with deep neural networks." Nature. 2017, (115–118).

Zhu, X. X., Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, & Feng Xu. (2017). "Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources." IEEE Geoscience and Remote Sensing Magazine. 2017, (8–36).

Szegedy, C., Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, & Andrew Rabinovich. (2015). "Going Deeper with Convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, (1–9).

Howard, A. G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, & Hartwig Adam. (2017). "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv preprint arXiv:1704.04861. 2017, (1–10).

OpenAI. (2021). "CLIP: Connecting Text and Images." arXiv preprint arXiv:2103.00020. 2021, (1–8).

Pan, S. J., & Yang, Q. (2010). "A Survey on Transfer Learning." IEEE Transactions on Knowledge and Data Engineering. 2010, (1345–1359).

Yosinski, J., Jeff Clune, Yoshua Bengio, & Hod Lipson. (2014). "How Transferable Are Features in Deep Neural Networks?" Advances in Neural Information Processing Systems. 2014, (3320–3328).

Bello, I., Barret Zoph, Ashish Vaswani, Jonathon Shlens, & Quoc V. Le. (2021). "Attention Augmented Convolutional Networks." IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021, (1–10).

Shorten, C., & Khoshgoftaar, T. M. (2019). "A survey on image data augmentation for deep learning." Journal of Big Data. 2019, (60–88).

Ioffe, S., & Szegedy, C. (2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." International Conference on Machine Learning. 2015, (448–456).

Yun, S., Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, & Youngjoon Yoo. (2019). "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features." Proceedings of the IEEE International Conference on Computer Vision. 2019, (1–10).

Zhang, H., Moustapha Cisse, Yann N. Dauphin & David Lopez-Paz. (2018). "mixup: Beyond Empirical Risk Minimization." International Conference on Learning Representations. 2018, (1–12).

Opitz, D & Maclin, R. (1999). "Popular Ensemble Methods: An Empirical Study." Journal of Artificial Intelligence Research. 1999, (169–198).

Russakovsky, O, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein & Alexander C. Berg, Li Fei-Fei. (2015). "ImageNet Large Scale Visual Recognition Challenge." International Journal of Computer Vision. 2015, (211–252).

Zhu, X. J., & Goldberg, A. B. (2009). "Introduction to Semi-Supervised Learning." Synthesis Lectures on Artificial Intelligence and Machine Learning. 2009, (1–130).

Settles, B. (2012). "Active learning." Synthesis Lectures on Artificial Intelligence and Machine Learning. 2012, (1–114).

Han, S., Jeff Pool, John Tran, & William J. Dally. (2015). "Learning both Weights and Connections for Efficient Neural Networks." Advances in Neural Information Processing Systems. 2015, (1135–1143).

Selvaraju, R. R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, & Dhruv Batra. (2017). "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization." Proceedings of the IEEE International Conference on Computer Vision. 2017, (618–626).

Miller, T. (2019). "Explanation in Artificial Intelligence: Insights from the Social Sciences." Artificial Intelligence. 2019, (1–38).

Goodfellow, I., Jonathon Shlens, & Christian Szegedy. (2015). "Explaining and Harnessing Adversarial Examples." International Conference on Learning Representations. 2015, (1–8).

Mehrabi, N., Fred Morstatter, Nripsuta Saxena, Kristina Lerman, & Aram Galstyan. (2021). "A Survey on Bias and Fairness in Machine Learning." ACM Computing Surveys. 2021, (1–38).

Kamiran, F., & Calders, T. (2009). "Classifying without Discriminating." International Conference on Computer, Control and Communication. 2009, (1–6).

Wang, Y., Quanming Yao, James Kwok, & Lionel M. Ni. (2020). "Generalizing from a Few Examples: A Survey on Few-Shot Learning." ACM Computing Surveys. 2020, (1–34).

Snell, J., Kevin Swersky, & Richard S. Zemel. (2017). "Prototypical Networks for Few-shot Learning." Advances in Neural Information Processing Systems. 2017, (4080–4090).

Grill, J. B., Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, & Michal Valko. (2020). "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning." Advances in Neural Information Processing Systems. 2020, (1–12).

He, K., Haoqi Fan, Yuxin Wu, Saining Xie, & Ross Girshick. (2020). "Momentum Contrast for Unsupervised Visual Representation Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, (9729–9738).

Kairouz, P., H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G.L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, & Sen Zhao. (2019). "Advances and Open Problems in Federated Learning." arXiv preprint arXiv:1912.04977. 2019, (1–21).

Li, T., Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, & Virginia Smith. (2020). "Federated Optimization in Heterogeneous Networks." Proceedings of Machine Learning and Systems. 2020, (429–450).

Biamonte, J., Peter Wittek, Nicola Pancotti, & Patrick Rebentros (2017). "Quantum Machine Learning." Nature. 2017, (195–202).

Schuld, M & Petruccione, F. (2018). "Supervised Learning with Quantum Computers." Springer. 2018, (1–300).

Jagdish Jangid , " Efficient Training Data Caching for Deep Learning in Edge Computing Networks" International Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 6, Issue 5, pp.337-362, September-October-2020. Available at doi : https://doi.org/10.32628/CSEIT20631113

Downloads

Published

13-01-2025

Issue

Section

Research Articles