A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly Manner
DOI:
https://doi.org/10.32628/CSEIT2390147Keywords:
Machine Learning, Reinforcement Learning, Fictitious Play, Multi-Agent SARSA Learning, Friend-or-Foe Q-Learning (FFQ), Nash-Q Learning.Abstract
Reinforcement learning is an area of Machine Learning. The three primary types of machine learning are supervised learning, unsupervised learning, and reinforcement learning (RL). Pre-training a model on a labeled dataset is known as supervised learning. The model is trained on unlabeled data in unsupervised learning, on the other hand. Instead of being driven by labels, RL is motivated by assessing feedback. By interacting with the environment and choosing the best course of action in each circumstance in order to maximize the reward, the agent learns the best way to solve sequential decision-making issues. The RL agent chooses how to carry out tasks on its own. Furthermore, since there are no training data, the agent learns by gaining experience. In order to make subsequent judgments, RL aids agents in efficiently interacting with their surroundings. In this essay, the state-of-the-art RL is thoroughly reviewed in the literature. Applications for reinforcement learning (RL) may be found in a wide range of industries, including smart grids, robots, computer vision, healthcare, gaming, transportation, finance, and engineering.
References
- K. D. Stephan, B Notes for a history of the IEEE society on social implications of technology,[ IEEE Technol. Soc. Mag., vol. 25, no. 4, pp. 5–14, 2006
- Yusuf Perwej, “The Bidirectional Long-Short-Term Memory Neural Network based Word Retrieval for Arabic Documents”, for published in the Transactions on Machine Learning and Artificial Intelligence (TMLAI), Society for Science and Education, United Kingdom (UK), Volume 3, Issue 1, Pages 16 - 27, 2015, DOI: 10.14738/tmlai.31.863
- R. Sutton and A. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, vol. 2, 2015
- A. Coronato, M. Naeem, G. De Pietro and G. Paragliola, "Reinforcement learning for intelligent healthcare applications: A survey", Artif. Intell. Med., vol. 109, 2020
- Sutton, R.S.; Barto, A.G. Finitie Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, pp. 47–68, 2020
- Virvou, M.; Alepis, E.; Tsihrintzis, G.A.; L.C. Machine Learning Paradigms; Springer: Cham, Switzerland, 2020
- Firoj Parwej, Nikhat Akhtar, Yusuf Perwej, “A Close-Up View About Spark in Big Data Jurisdiction”, International Journal of Engineering Research and Application (IJERA), ISSN: 2248-9622, Volume 8, Issue 1, (Part -I1), Pages 26-41, 2018, DOI: 10.9790/9622-0801022641
- Yusuf Perwej, “The Ambient Scrutinize of Scheduling Algorithms in Big Data Territory”, International Journal of Advanced Research (IJAR), ISSN 2320-5407, Volume 6, Issue 3, Pages 241-258, 2018, DOI: 10.21474/IJAR01/6672
- Yusuf Perwej, “An Optimal Approach to Edge Detection Using Fuzzy Rule and Sobel Method”, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Volume 4, Issue 11, Pages 9161-9179, 2015, DOI: 10.15662/IJAREEIE.2015.0411054
- R. S. Sutton and A. G. Barto, Reinforcement learning : an introduction, 2nd ed. Cambridge, MA: Mit Press, 2017
- Yusuf Perwej, “Unsupervised Feature Learning for Text Pattern Analysis with Emotional Data Collection: A Novel System for Big Data Analytics”, IEEE International Conference on Advanced computing Technologies & Applications (ICACTA'22), SCOPUS, IEEE No: #54488 ISBN No Xplore: 978-1-6654-9515-8, Coimbatore, India, 4-5 March 2022, DOI: 10.1109/ICACTA54488.2022.9753501
- Y. Perwej, Ashish Chaturvedi, “Machine Recognition of Hand Written Characters using Neural Networks”, International Journal of Computer Applications (IJCA), USA, ISSN 0975 – 8887, Volume 14, No. 2, Pages 6- 9, 2011, DOI: 10.5120/1819-2380
- Wiering, M. A., & Van Otterlo, M., "Reinforcement learning," Adaptation, learning, and optimization, Vol.12, No.3, 2012
- Trivedi, A.; Tripathi, C.M.; Perwej, Y.; Srivastava, A.K.; Kulshrestha, N. Face Recognition Based Automated Attendance Management System. Int. J. Sci. Res. Sci. Technol, 9, 261–268, 2022,
- Sutton, R.S.; Barto, A.G. Finitie Markov Decision Processes. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, CA, USA, 2020; pp. 47–68. Available online: http://incompleteideas.net/book/RLbook2020.pdf (accessed on 17 September 2022).
- Virvou, M.; Alepis, E.; Tsihrintzis, G.A.; Jain, L.C. Machine Learning Paradigms; Springer: Cham, Switzerland, 2020. [CrossRef]
- Coursera. 3 Types of Machine Learning You Should Know. 2022. Available online: https://www.coursera.org/articles/types-ofmachine- learning (accessed on 12-Jan-2023).
- T. F. G. Title, D. R. Learning, E. Telem, G. Mu, F. Advisor, and C. B. Mux, “Bachelor degree thesis,” 2018
- M. R. F. Mendonca, H. S. Bernardino, and R. F. Neto, “Simulating human behavior in fighting games using reinforcement learning and artificial neural networks,” in Proceedings of the 2015 14th Brazilian Symposium on Computer Games and Digital Enter- tainment (SBGames), vol. 0, pp. 152–159, Piau ́ı, Brazil, November 2015
- V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” in Conference on Neural Information Processing Systems, 2013, pp. 1–9.
- Shubham Mishra, Mrs Versha Verma, Nikhat Akhtar, Shivam Chaturvedi, Yusuf Perwej, “An Intelligent Motion Detection Using OpenCV” , International Journal of Scientific Research in Science, Engineering and Technology (IJSRSET), Print ISSN: 2395-1990 , Online ISSN : 2394-4099, Volume 9, Issue 2, Pages 51-63, 2022, DOI: 10.32628/IJSRSET22925
- Panzer, M.; Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 2021, 60, 4316–4341. [CrossRef]
- DeepMind, “About Us.” [Online]. Available: https://deepmind.com/about/. [Accessed: 14-Jan-2023]
- G. Tesauro, “TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play,” Neural Comput., vol. 6, no. 2, pp. 215–219, 1994
- D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016
- C.-S. Lee, M.-H. Wang, S.-J. Yen et al., “Human vs. Computer go: review and prospect [discussion forum],” IEEE Computational Intelligence Magazine, vol. 11, no. 3, pp. 67–72, 2016
- S. Carta, A. Corriga, A. Ferreira, D. R. Recupero, and A. S. Podda, “A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning,” 2020
- A. Perwej, K.P. Yadav, V. Sood and Y. Perwej, “An Evolutionary Approach to Bombay Stock Exchange Prediction with Deep Learning Technique”, IOSR Journal on Business Management, Vol. 20, No. 12, pp. 63-79, 2018
- Asif Perwej, Dr. Yusuf Perwej, Nikhat Akhtar, and Firoj Parwej, “A FLANN and RBF with PSO Viewpoint to Identify a Model for Competent Forecasting Bombay Stock Exchange”, COMPUSOFT, SCOPUS, International Journal of Advanced Computer Technology, 4 (1), Volume- IV, Issue-I, Pages 1454-1461, 2015, DOI:10.6084/ijact.v4i1.60
- D. Silver et al., “Mastering Chess Shogi by Self-Play with a General Reinforcement Learning Algorithm,” London, 2017
- Bellman, R.E. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [CrossRef]
- van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. In Reinforcement Learning; Springer: Berlin, Germany, 2012; Volume 12. [CrossRef]
- Nikhat Akhtar, Dr. Devendera Agarwal,” A Perceptual Evaluation of Optimization Algorithms and Iterative Method for E-Commerce“, International Journal of Science and Research (IJSR), ISSN (Online): 2319-7064, Volume 3 Issue 12, Pages 2527 – 2534, 2014
- P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, “An application of reinforcement learning to aerobatic helicopter flight,” Education, vol. 19, p. 1, 2007.
- A. Y. Ng et al., “Autonomous inverted helicopter flight via reinforcement learning,” Springer Tracts Adv. Robot., vol. 21, pp. 363–372, 2006.
- S. Carta, A. Ferreira, A. S. Podda, D. Reforgiato Recupero, and A. Sanna, “Multi-DQN: an ensemble of deep Q-learning agents for stock market forecasting,” Expert Systems with Applications, vol. 164, Article ID 113820, 2021.
- Luketina, J., Nardelli, N., Farquhar, G., Foerster, J., Andreas, J., Grefenstette, E., & Rocktäschel, T., "A survey of reinforcement learning informed by natural language," arXiv preprint, arXiv:1906.03926, 2019.
- Shobhit Kumar Ravi, Shivam Chaturvedi, Dr. Neeta Rastogi, Dr. Nikhat Akhtar, Dr. Yusuf Perwej, “A Framework for Voting Behavior Prediction Using Spatial Data”, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), ISSN: 2347-5552, Volume 10, Issue 2, Pages 19-28, 2022, DOI: 10.55524/ijircst.2022.10.2.4
- Yusuf Perwej, “Recurrent Neural Network Method in Arabic Words Recognition System”, International Journal of Computer Science and Telecommunications (IJCST), UK, London Volume 3, Issue 11, Pages 43-48, 2012
- Ahmed, S.H., and Koob, G.F., "Transition to Drug Addiction: A Negative Reinforcement Model Based on an Allostatic Decrease in Reward Function", Psychopharmacology, 180(3), pp. 473-490, 2005
- Sutton S, Barto G. Reinforcement Learning: An Introduction [M], Cambridge, MA, USA: MIT Press, 1998
- q=https://drive.google.com/drive/folders/1lZFy9KDnW8Ru2OD6wFgP1GXpZHf7QAy&sa=D&source=docs&ust=1643043550121570&usg=AOvVaw2ZDr6mtmPCmH6I1W4czIBj
- https://neptune.ai/blog/model-based-and-model-free-reinforcement-learning-pytennis-case-study
- Beom H B. A sensor2based navigation for a mobile robot using fuzzy logic and reinforcement learning [J]. IEEE Trans. on Systems, Man, and Cybernetics, 25(3):464-477, 1995
- M. van Otterlo and M. Wiering, Reinforcement Learning and Markov Decision Processes, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 3-42, 2012
- S. Coskun and R. Langari, "Predictive Fuzzy Markov Decision Strategy for Autonomous Driving in Highways", 2018 IEEE Conf. Control Technol. Appl. CCTA 2018, pp. 1032-1039, 2018
- Mumdouh Mirghani Mohamed Hassan,Yusuf Perwej, Awad Haj Ali Ahmed, Firoj Parwej,“ Using Intelligent Transportation Systems the Modern Traffic Safety on the Highway in the Sudan” , International Journal of Computer Science Trends and Technology (IJCST), ISSN 2347 – 8578, Volume 7, Issue 3, Pages 1- 13, May - Jun 2019, DOI: 10.33144/23478578/IJCST-V7I3P1
- J. Hu and M. P.Wellman, “Nash Q-learning for general-sum stochastic games,” J. Mach. Learn. Res., vol. 4, pp. 1039–1069, 2003
- N. Suematsu and A. Hayashi, “A multiagent reinforcement learning algorithm using extended optimal response,” in Proc. 1st Int. Joint Conf. Auton. Agents & Multi agent Syst., Bologna, Italy, July 15-19 2002, pp. 370–377
- M. Bowling, “Multiagent learning in the presence of agents with limitations,” Ph.D. dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, May 2003
- M. H. Bowling and M. M. Veloso, “Multiagent learning using a variable learning rate,” Art. Intell., vol. 136, no. 2, pp. 215–250, 2002
- M. L. Littman, “Friend-or-foe Q-learning in general-sum games,” in Proc. 18th Int. Conf. Machine Learning, Morgan Kaufman, pp. 322–328, 2001
- M. L. Littman, “Markov games as a framework for multi-agent learning,” in Proc. 11th Int. Conf. Machine Learning, San Francisco, pp. 157–163, 1994
- M. L. Littman and C. Szepesv ́ari, “A generalized reinforcement-learning model: convergence and applications,” in Proc. 13th Int. Conf. Machine Learning, Bari, Italy, pp. 310–318, 1996
- M. L. Littman, “Value-function reinforcement learning in markov games,” J. Cogn. Syst. Res., vol. 2, pp. 55–66, 2001
- E. F. Morales, “Scaling up reinforcement learning with a relational representation,” in Workshop Adaptabil. Multi-Agent Syst., Sydney, 2003
- George W. Brown. Some notes on computation of Games Solutions. RAND Corporation Report P-78, April 1949.
- Julia Robinson. An iterative method of solving a game. The Annals of Mathematics, 54(2):296 – 301, 1951
- Yuri V. Matiyasevich. Hilbert’s Tenth Problem. MIT Press, Cambridge, Massachusetts, 1993.
- K. Miyasawa. On the convergence of the learning process in a 2 x 2 nonzero sum two-person game. Technical report, Research memo 33, Princeton University, 1961.
- D. Monderer and L. S. Shapley. Fictitious play property for games with identical interests. Journal of Economic Theory, 68:258–265, 1996.
- J. Nachbar. Evolutionary selection dynamics in games: Convergence and limit properties. International Journal of Game Theory, 19:59–89, 1990
- J. Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296–301, 1951
- B. Banerjee and J. Peng. Rvσ(t): A unifying approach to performance and convergence in online multiagent learning. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 798–800, Hakodate, Japan, 2006
- M. Bowling. Convergence and no-regret in multiagent learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 209–216, Vancouver, Canada, 2005
- M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 136:215–250, 2002
- D. G. Lainiotis, "Sequential Structure and Parameter Adaptive Pattern Recognition Part I: Supervised Learning", IEEE Transactions on Information Theory, vol. IT-16, no. 5, pp. 548-556, September 1970
- Yusuf Perwej, Nikhat Akhtar, Firoj Parwej, “The Kingdom of Saudi Arabia Vehicle License Plate Recognition using Learning Vector Quantization Artificial Neural Network”, International Journal of Computer Applications (IJCA), USA, Volume 98, No.11, Pages 32 – 38, 2014, DOI: 10.5120/17230-7556
- Poranki KR, Perwej Y, Perwej A. ,” The Level of Customer Satisfaction related to GSM in India”, RJSITM., 4(3):32-3, 2015
- Perwej A, Yadav KP, Sood V, Perwej Y,”An evolutionary approach to bombay stock exchange prediction with deep learning technique”, IOSR J Bus Manag (IOSR-JBM) 20(12):63–79, 2018
- Yusuf Perwej , Asif Perwej , “Forecasting of Indian Rupee (INR) / US Dollar (USD) Currency Exchange Rate Using Artificial Neural Network”, International Journal of Computer Science, Engineering and Applications (IJCSEA), Academy & Industry Research Collaboration Center (AIRCC), USA , Volume 2, No. 2, Pages 41- 52, April 2012, DOI: 10.5121/ijcsea.2012.2204
- L. Busoniu, R. Babuska, B. De Schutter, "A comprehensive survey of multiagent reinforcement learning", IEEE Trans. Syst. Man Cybern., vol. 38, no. 2, pp. 156-172, 2008
- Saurabh Sahu, Km Divya, Dr. Neeta Rastogi, Puneet Kumar Yadav, Dr. Yusuf Perwej, “Sentimental Analysis on Web Scraping Using Machine Learning Method” , Journal of Information and Computational Science (JOICS), ISSN: 1548-7741, Volume 12, Issue 8, Pages 24-29, August 2022, DOI: 10.12733/JICS.2022/V12I08.535569.67004
- Nikhat Akhtar, Devendera Agarwal, “An Efficient Mining for Recommendation System for Academics”, International Journal of Recent Technology and Engineering (IJRTE), SCOPUS, Volume-8, Issue-5, Pages 1619-1626, 2020 , DOI: 10.35940/ijrte.E5924.018520
- Yusuf Perwej, “The Ambient Scrutinize of Scheduling Algorithms in Big Data Territory”,International Journal of Advanced Research (IJAR), ISSN 2320-5407, Volume 6, Issue 3, Pages 241-258, 2018, DOI: 10.21474/IJAR01/6672
- D. Liu and C. Yang, "A Deep Reinforcement Learning Approach to Proactive Content Pushing and Recommendation for Mobile Users", IEEE Access, vol. 7, pp. 83120-83136, 2019
- Nikhat Akhtar, Devendera Agarwal, “An Influential Recommendation System Usage for General Users”, for published in the Communications on Applied Electronics (CAE), ISSN: 2394-4714, Foundation of Computer Science, New York, USA, Vol. 5, No.7, Pages 5 – 9, July 2016, DOI: 10.5120/cae2016652315
- Aboudonia, A., Scianca, N., de Simone, D., Lanari, L., & Oriolo, G. (2017). Humanoid gait generation for walk-to locomotion using single-stage MPC. In 17th IEEE-RAS Int. Conf. on Humanoid Robots, pp. 178–183
- J. Hill, W. R. Ford and I. G. Farreras, "Real conversations with artificial intelligence: A comparison between human-human online conversations and human-chatbot conversations", Computers in Human Behavior, vol. 49, pp. 245-250, 2015
- D. Silver, A Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., "Mastering the game of Go with deep neural networks and tree search", Nature, vol. 529, no. 7587, pp. 484-489, 2016
- C. JCH. Watkins and P. Dayan, "Q-Learning", Mach. Learn, vol. 8, no. 3–4, pp. 279-292, May 1992
- Ankit Kumar, Neha kulshrestha, Yusuf Perwej, Ashish Kumar Srivastava, Chandan Mani Tripathi, “The Assay of Potholes and Road Damage Detection”, International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 1, Pages 202-211, January-February-2022, DOI: 10.32628/CSEIT228135
- Nikhat Akhtar, Saima Rahman, Halima Sadia, Yusuf Perwej, “A Holistic Analysis of Medical Internet of Things (MIoT)”, Journal of Information and Computational Science (JOICS), ISSN: 1548 - 7741, Volume 11, Issue 4, Pages 209 - 222, 2021, DOI: 10.12733/JICS.2021/V11I3.535569.31023
- Yusuf Perwej, Nikhat Akhtar, Neha kulshrestha, Pavan Mishra, “A Methodical Analysis of Medical Internet of Things (MIoT) Security and Privacy in Current and Future Trends”, Journal of Emerging Technologies and Innovative Research (JETIR), Volume 09, Issue 1, Pages 346 - 371, 2022, DOI: 10.6084/m9.figshare.JETIR2201346
- Y. Wang, "Natural language processing and applications in machine learning", Modern Chinese, vol. 5, pp. 187-191, 2019
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.