A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly Manner

Shweta Pandey; Rohit Agarwal; Sachin Bhardwaj; Sanjay Kumar Singh; Dr. Yusuf Perwej; Niraj Kumar Singh

doi:10.32628/CSEIT2390147

Authors

Shweta Pandey Scholar, B.Tech, Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, India
Rohit Agarwal Assistant Professor, Department of Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, Uttar Pradesh, India
Sachin Bhardwaj Assistant Professor, Department of Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, Uttar Pradesh, India
Sanjay Kumar Singh Assistant Professor, Department of Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, Uttar Pradesh, India
Dr. Yusuf Perwej Assistant Professor, Department of Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, Uttar Pradesh, India
Niraj Kumar Singh Assistant Professor, Department of Computer Science & Engineering, Ambalika Institute of Management & Technology, Lucknow, Uttar Pradesh, India

DOI:

https://doi.org/10.32628/CSEIT2390147

Keywords:

Machine Learning, Reinforcement Learning, Fictitious Play, Multi-Agent SARSA Learning, Friend-or-Foe Q-Learning (FFQ), Nash-Q Learning.

Abstract

Reinforcement learning is an area of Machine Learning. The three primary types of machine learning are supervised learning, unsupervised learning, and reinforcement learning (RL). Pre-training a model on a labeled dataset is known as supervised learning. The model is trained on unlabeled data in unsupervised learning, on the other hand. Instead of being driven by labels, RL is motivated by assessing feedback. By interacting with the environment and choosing the best course of action in each circumstance in order to maximize the reward, the agent learns the best way to solve sequential decision-making issues. The RL agent chooses how to carry out tasks on its own. Furthermore, since there are no training data, the agent learns by gaining experience. In order to make subsequent judgments, RL aids agents in efficiently interacting with their surroundings. In this essay, the state-of-the-art RL is thoroughly reviewed in the literature. Applications for reinforcement learning (RL) may be found in a wide range of industries, including smart grids, robots, computer vision, healthcare, gaming, transportation, finance, and engineering.

References

K. D. Stephan, B Notes for a history of the IEEE society on social implications of technology,[ IEEE Technol. Soc. Mag., vol. 25, no. 4, pp. 5–14, 2006
Yusuf Perwej, “The Bidirectional Long-Short-Term Memory Neural Network based Word Retrieval for Arabic Documents”, for published in the Transactions on Machine Learning and Artificial Intelligence (TMLAI), Society for Science and Education, United Kingdom (UK), Volume 3, Issue 1, Pages 16 - 27, 2015, DOI: 10.14738/tmlai.31.863
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, vol. 2, 2015
A. Coronato, M. Naeem, G. De Pietro and G. Paragliola, "Reinforcement learning for intelligent healthcare applications: A survey", Artif. Intell. Med., vol. 109, 2020
Sutton, R.S.; Barto, A.G. Finitie Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, pp. 47–68, 2020
Virvou, M.; Alepis, E.; Tsihrintzis, G.A.; L.C. Machine Learning Paradigms; Springer: Cham, Switzerland, 2020
Firoj Parwej, Nikhat Akhtar, Yusuf Perwej, “A Close-Up View About Spark in Big Data Jurisdiction”, International Journal of Engineering Research and Application (IJERA), ISSN: 2248-9622, Volume 8, Issue 1, (Part -I1), Pages 26-41, 2018, DOI: 10.9790/9622-0801022641
Yusuf Perwej, “The Ambient Scrutinize of Scheduling Algorithms in Big Data Territory”, International Journal of Advanced Research (IJAR), ISSN 2320-5407, Volume 6, Issue 3, Pages 241-258, 2018, DOI: 10.21474/IJAR01/6672
Yusuf Perwej, “An Optimal Approach to Edge Detection Using Fuzzy Rule and Sobel Method”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Volume 4, Issue 11, Pages 9161-9179, 2015, DOI: 10.15662/IJAREEIE.2015.0411054
R. S. Sutton and A. G. Barto, Reinforcement learning : an introduction, 2nd ed. Cambridge, MA: Mit Press, 2017
Yusuf Perwej, “Unsupervised Feature Learning for Text Pattern Analysis with Emotional Data Collection: A Novel System for Big Data Analytics”, IEEE International Conference on Advanced computing Technologies & Applications (ICACTA'22), SCOPUS, IEEE No: #54488 ISBN No Xplore: 978-1-6654-9515-8, Coimbatore, India, 4-5 March 2022, DOI: 10.1109/ICACTA54488.2022.9753501
Y. Perwej, Ashish Chaturvedi, “Machine Recognition of Hand Written Characters using Neural Networks”, International Journal of Computer Applications (IJCA), USA, ISSN 0975 – 8887, Volume 14, No. 2, Pages 6- 9, 2011, DOI: 10.5120/1819-2380
Wiering, M. A., & Van Otterlo, M., "Reinforcement learning," Adaptation, learning, and optimization, Vol.12, No.3, 2012
Trivedi, A.; Tripathi, C.M.; Perwej, Y.; Srivastava, A.K.; Kulshrestha, N. Face Recognition Based Automated Attendance Management System. Int. J. Sci. Res. Sci. Technol, 9, 261–268, 2022,
Sutton, R.S.; Barto, A.G. Finitie Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, 2020; pp. 47–68. Available online: http://incompleteideas.net/book/RLbook2020.pdf (accessed on 17 September 2022).
Virvou, M.; Alepis, E.; Tsihrintzis, G.A.; Jain, L.C. Machine Learning Paradigms; Springer: Cham, Switzerland, 2020. [CrossRef]
Coursera. 3 Types of Machine Learning You Should Know. 2022. Available online: https://www.coursera.org/articles/types-ofmachine- learning (accessed on 12-Jan-2023).
T. F. G. Title, D. R. Learning, E. Telem, G. Mu, F. Advisor, and C. B. Mux, “Bachelor degree thesis,” 2018
M. R. F. Mendonca, H. S. Bernardino, and R. F. Neto, “Simulating human behavior in fighting games using reinforcement learning and artificial neural networks,” in Proceedings of the 2015 14th Brazilian Symposium on Computer Games and Digital Enter- tainment (SBGames), vol. 0, pp. 152–159, Piau ́ı, Brazil, November 2015
V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” in Conference on Neural Information Processing Systems, 2013, pp. 1–9.
Shubham Mishra, Mrs Versha Verma, Nikhat Akhtar, Shivam Chaturvedi, Yusuf Perwej, “An Intelligent Motion Detection Using OpenCV” , International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN: 2395-1990 , Online ISSN : 2394-4099, Volume 9, Issue 2, Pages 51-63, 2022, DOI: 10.32628/IJSRSET22925
Panzer, M.; Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 2021, 60, 4316–4341. [CrossRef]
DeepMind, “About Us.” [Online]. Available: https://deepmind.com/about/. [Accessed: 14-Jan-2023]
G. Tesauro, “TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play,” Neural Comput., vol. 6, no. 2, pp. 215–219, 1994
D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016
C.-S. Lee, M.-H. Wang, S.-J. Yen et al., “Human vs. Computer go: review and prospect [discussion forum],” IEEE Computational Intelligence Magazine, vol. 11, no. 3, pp. 67–72, 2016
S. Carta, A. Corriga, A. Ferreira, D. R. Recupero, and A. S. Podda, “A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning,” 2020
A. Perwej, K.P. Yadav, V. Sood and Y. Perwej, “An Evolutionary Approach to Bombay Stock Exchange Prediction with Deep Learning Technique”, IOSR Journal on Business Management, Vol. 20, No. 12, pp. 63-79, 2018
Asif Perwej, Dr. Yusuf Perwej, Nikhat Akhtar, and Firoj Parwej, “A FLANN and RBF with PSO Viewpoint to Identify a Model for Competent Forecasting Bombay Stock Exchange”, COMPUSOFT, SCOPUS, International Journal of Advanced Computer Technology, 4 (1), Volume- IV, Issue-I, Pages 1454-1461, 2015, DOI:10.6084/ijact.v4i1.60
D. Silver et al., “Mastering Chess Shogi by Self-Play with a General Reinforcement Learning Algorithm,” London, 2017
Bellman, R.E. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [CrossRef]
van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. In Reinforcement Learning; Springer: Berlin, Germany, 2012; Volume 12. [CrossRef]
Nikhat Akhtar, Dr. Devendera Agarwal,” A Perceptual Evaluation of Optimization Algorithms and Iterative Method for E-Commerce“, International Journal of Science and Research (IJSR), ISSN (Online): 2319-7064, Volume 3 Issue 12, Pages 2527 – 2534, 2014
P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, “An application of reinforcement learning to aerobatic helicopter flight,” Education, vol. 19, p. 1, 2007.
A. Y. Ng et al., “Autonomous inverted helicopter flight via reinforcement learning,” Springer Tracts Adv. Robot., vol. 21, pp. 363–372, 2006.
S. Carta, A. Ferreira, A. S. Podda, D. Reforgiato Recupero, and A. Sanna, “Multi-DQN: an ensemble of deep Q-learning agents for stock market forecasting,” Expert Systems with Applications, vol. 164, Article ID 113820, 2021.
Luketina, J., Nardelli, N., Farquhar, G., Foerster, J., Andreas, J., Grefenstette, E., & Rocktäschel, T., "A survey of reinforcement learning informed by natural language," arXiv preprint, arXiv:1906.03926, 2019.
Shobhit Kumar Ravi, Shivam Chaturvedi, Dr. Neeta Rastogi, Dr. Nikhat Akhtar, Dr. Yusuf Perwej, “A Framework for Voting Behavior Prediction Using Spatial Data”, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), ISSN: 2347-5552, Volume 10, Issue 2, Pages 19-28, 2022, DOI: 10.55524/ijircst.2022.10.2.4
Yusuf Perwej, “Recurrent Neural Network Method in Arabic Words Recognition System”, International Journal of Computer Science and Telecommunications (IJCST), UK, London Volume 3, Issue 11, Pages 43-48, 2012
Ahmed, S.H., and Koob, G.F., "Transition to Drug Addiction: A Negative Reinforcement Model Based on an Allostatic Decrease in Reward Function", Psychopharmacology, 180(3), pp. 473-490, 2005
Sutton S, Barto G. Reinforcement Learning: An Introduction [M], Cambridge, MA, USA: MIT Press, 1998
q=https://drive.google.com/drive/folders/1lZFy9KDnW8Ru2OD6wFgP1GXpZHf7QAy&sa=D&source=docs&ust=1643043550121570&usg=AOvVaw2ZDr6mtmPCmH6I1W4czIBj
https://neptune.ai/blog/model-based-and-model-free-reinforcement-learning-pytennis-case-study
Beom H B. A sensor2based navigation for a mobile robot using fuzzy logic and reinforcement learning [J]. IEEE Trans. on Systems, Man, and Cybernetics, 25(3):464-477, 1995
M. van Otterlo and M. Wiering, Reinforcement Learning and Markov Decision Processes, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 3-42, 2012
S. Coskun and R. Langari, "Predictive Fuzzy Markov Decision Strategy for Autonomous Driving in Highways", 2018 IEEE Conf. Control Technol. Appl. CCTA 2018, pp. 1032-1039, 2018
Mumdouh Mirghani Mohamed Hassan,Yusuf Perwej, Awad Haj Ali Ahmed, Firoj Parwej,“ Using Intelligent Transportation Systems the Modern Traffic Safety on the Highway in the Sudan” , International Journal of Computer Science Trends and Technology (IJCST), ISSN 2347 – 8578, Volume 7, Issue 3, Pages 1- 13, May - Jun 2019, DOI: 10.33144/23478578/IJCST-V7I3P1
J. Hu and M. P.Wellman, “Nash Q-learning for general-sum stochastic games,” J. Mach. Learn. Res., vol. 4, pp. 1039–1069, 2003
N. Suematsu and A. Hayashi, “A multiagent reinforcement learning algorithm using extended optimal response,” in Proc. 1st Int. Joint Conf. Auton. Agents & Multi agent Syst., Bologna, Italy, July 15-19 2002, pp. 370–377
M. Bowling, “Multiagent learning in the presence of agents with limitations,” Ph.D. dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, May 2003
M. H. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Art. Intell., vol. 136, no. 2, pp. 215–250, 2002
M. L. Littman, “Friend-or-foe Q-learning in general-sum games,” in Proc. 18th Int. Conf. Machine Learning, Morgan Kaufman, pp. 322–328, 2001
M. L. Littman, “Markov games as a framework for multi-agent learning,” in Proc. 11th Int. Conf. Machine Learning, San Francisco, pp. 157–163, 1994
M. L. Littman and C. Szepesv ́ari, “A generalized reinforcement-learning model: convergence and applications,” in Proc. 13th Int. Conf. Machine Learning, Bari, Italy, pp. 310–318, 1996
M. L. Littman, “Value-function reinforcement learning in markov games,” J. Cogn. Syst. Res., vol. 2, pp. 55–66, 2001
E. F. Morales, “Scaling up reinforcement learning with a relational representation,” in Workshop Adaptabil. Multi-Agent Syst., Sydney, 2003
George W. Brown. Some notes on computation of Games Solutions. RAND Corporation Report P-78, April 1949.
Julia Robinson. An iterative method of solving a game. The Annals of Mathematics, 54(2):296 – 301, 1951
Yuri V. Matiyasevich. Hilbert’s Tenth Problem. MIT Press, Cambridge, Massachusetts, 1993.
K. Miyasawa. On the convergence of the learning process in a 2 x 2 nonzero sum two-person game. Technical report, Research memo 33, Princeton University, 1961.
D. Monderer and L. S. Shapley. Fictitious play property for games with identical interests. Journal of Economic Theory, 68:258–265, 1996.
J. Nachbar. Evolutionary selection dynamics in games: Convergence and limit properties. International Journal of Game Theory, 19:59–89, 1990
J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296–301, 1951
B. Banerjee and J. Peng. Rvσ(t): A unifying approach to performance and convergence in online multiagent learning. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 798–800, Hakodate, Japan, 2006
M. Bowling. Convergence and no-regret in multiagent learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 209–216, Vancouver, Canada, 2005
M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 136:215–250, 2002
D. G. Lainiotis, "Sequential Structure and Parameter Adaptive Pattern Recognition Part I: Supervised Learning", IEEE Transactions on Information Theory, vol. IT-16, no. 5, pp. 548-556, September 1970
Yusuf Perwej, Nikhat Akhtar, Firoj Parwej, “The Kingdom of Saudi Arabia Vehicle License Plate Recognition using Learning Vector Quantization Artificial Neural Network”, International Journal of Computer Applications (IJCA), USA, Volume 98, No.11, Pages 32 – 38, 2014, DOI: 10.5120/17230-7556
Poranki KR, Perwej Y, Perwej A. ,” The Level of Customer Satisfaction related to GSM in India”, RJSITM., 4(3):32-3, 2015
Perwej A, Yadav KP, Sood V, Perwej Y,”An evolutionary approach to bombay stock exchange prediction with deep learning technique”, IOSR J Bus Manag (IOSR-JBM) 20(12):63–79, 2018
Yusuf Perwej , Asif Perwej , “Forecasting of Indian Rupee (INR) / US Dollar (USD) Currency Exchange Rate Using Artificial Neural Network”, International Journal of Computer Science, Engineering and Applications (IJCSEA), Academy & Industry Research Collaboration Center (AIRCC), USA , Volume 2, No. 2, Pages 41- 52, April 2012, DOI: 10.5121/ijcsea.2012.2204
L. Busoniu, R. Babuska, B. De Schutter, "A comprehensive survey of multiagent reinforcement learning", IEEE Trans. Syst. Man Cybern., vol. 38, no. 2, pp. 156-172, 2008
Saurabh Sahu, Km Divya, Dr. Neeta Rastogi, Puneet Kumar Yadav, Dr. Yusuf Perwej, “Sentimental Analysis on Web Scraping Using Machine Learning Method” , Journal of Information and Computational Science (JOICS), ISSN: 1548-7741, Volume 12, Issue 8, Pages 24-29, August 2022, DOI: 10.12733/JICS.2022/V12I08.535569.67004
Nikhat Akhtar, Devendera Agarwal, “An Efficient Mining for Recommendation System for Academics”, International Journal of Recent Technology and Engineering (IJRTE), SCOPUS, Volume-8, Issue-5, Pages 1619-1626, 2020 , DOI: 10.35940/ijrte.E5924.018520
Yusuf Perwej, “The Ambient Scrutinize of Scheduling Algorithms in Big Data Territory”,International Journal of Advanced Research (IJAR), ISSN 2320-5407, Volume 6, Issue 3, Pages 241-258, 2018, DOI: 10.21474/IJAR01/6672
D. Liu and C. Yang, "A Deep Reinforcement Learning Approach to Proactive Content Pushing and Recommendation for Mobile Users", IEEE Access, vol. 7, pp. 83120-83136, 2019
Nikhat Akhtar, Devendera Agarwal, “An Influential Recommendation System Usage for General Users”, for published in the Communications on Applied Electronics (CAE), ISSN: 2394-4714, Foundation of Computer Science, New York, USA, Vol. 5, No.7, Pages 5 – 9, July 2016, DOI: 10.5120/cae2016652315
Aboudonia, A., Scianca, N., de Simone, D., Lanari, L., & Oriolo, G. (2017). Humanoid gait generation for walk-to locomotion using single-stage MPC. In 17th IEEE-RAS Int. Conf. on Humanoid Robots, pp. 178–183
J. Hill, W. R. Ford and I. G. Farreras, "Real conversations with artificial intelligence: A comparison between human-human online conversations and human-chatbot conversations", Computers in Human Behavior, vol. 49, pp. 245-250, 2015
D. Silver, A Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search", Nature, vol. 529, no. 7587, pp. 484-489, 2016
C. JCH. Watkins and P. Dayan, "Q-Learning", Mach. Learn, vol. 8, no. 3–4, pp. 279-292, May 1992
Ankit Kumar, Neha kulshrestha, Yusuf Perwej, Ashish Kumar Srivastava, Chandan Mani Tripathi, “The Assay of Potholes and Road Damage Detection”, International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 1, Pages 202-211, January-February-2022, DOI: 10.32628/CSEIT228135
Nikhat Akhtar, Saima Rahman, Halima Sadia, Yusuf Perwej, “A Holistic Analysis of Medical Internet of Things (MIoT)”, Journal of Information and Computational Science (JOICS), ISSN: 1548 - 7741, Volume 11, Issue 4, Pages 209 - 222, 2021, DOI: 10.12733/JICS.2021/V11I3.535569.31023
Yusuf Perwej, Nikhat Akhtar, Neha kulshrestha, Pavan Mishra, “A Methodical Analysis of Medical Internet of Things (MIoT) Security and Privacy in Current and Future Trends”, Journal of Emerging Technologies and Innovative Research (JETIR), Volume 09, Issue 1, Pages 346 - 371, 2022, DOI: 10.6084/m9.figshare.JETIR2201346
Y. Wang, "Natural language processing and applications in machine learning", Modern Chinese, vol. 5, pp. 187-191, 2019

A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly Manner

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite