In Door Target-Driven Visual Robot Navigation Using Deep Reinforcement Learning (DRL) Approaches
Keywords:
Deep Reinforcement Learning, AI2-THOR, Advantage Actor-CriticAbstract
There have been several successful implementations of deep RL in game-like settings. Deep reinforcement learning (RL) has great potential, but it is difficult to apply it to visual navigation in realistic 3D settings. To guide an agent to an image-based goal, we present a unique learning architecture. We improved the efficiency of visual navigation by including additional tasks into the batched advantage actor-critic (A2C) algorithm. For the prediction of the depth map, the segmentation of the observation picture, and the segmentation of the target image, we propose three new auxiliary tasks. By doing these tasks, supervised learning may be used to pre-train a significant portion of the network, cutting down on the total number of training iterations. Gradually increasing the environment's complexity over time may further increase training performance. An effective neural network architecture is described that can generalize across numerous goals and settings. Our approach outperforms the best goal-oriented visual navigation algorithms in the literature on the AI2-THOR environment simulator, and it works in continuous state spaces.
References
- Li Y, “Deep reinforcement learning”, In: ICASSP 2018—2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada, 15–20, April 2018.
- Sun ZJ, Xue L, Xu YM, et al, “Overview of deep learning”, Appl Res Comput 2012, 12, pp. 2806–2810.
- Sutton RS and Barto AG, “Reinforcement learning: an introduction”, IEEE Transactions on Neural Networks, 2005.
- Hosu I-A and Rebedea T, “Playing Atari games with deep reinforcement learning and human checkpoint replay”, 2016. ArXiv, abs/1607.05077.
- Lillicrap TP, Hunt JJ, Pritzel A, et al, “Continuous control with deep reinforcement learning”, Comput Sci 2015, 8(6): A187.
- Caicedo JC and Lazebnik S, “Active object localization with deep reinforcement learning”, In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 2015, pp. 2488–2496.
- Meganathan RR, Kasi AA, and Jagannath S, “Computer vision based novel steering angle calculation for autonomous vehicles”, In: IEEE international conference on robotic computing, Laguna Hills, CA, USA, 31 January–2 February, 2018.
- Gupta S, Tolani V, Davidson J, et al, “Cognitive mapping and planning for visual navigation”, In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 7272–7281.
- Zhu Y, Mottaghi R, Kolve E, et al, “Target-driven visual navigation in indoor scenes using deep reinforcement learning”, In: 2017 IEEE international conference on robotics and automation (ICRA), Stockholm, 16–21 March 2016, pp. 3357–3364.
- S. Amarjyoti, “Deep reinforcement learning for robotic manipulation-the state of the art”, Bull. Transilv. Univ. Braşov, vol. 10, no. 2, 2017.
- A. V. Bernstein, E. Burnaev, and O. Kachan, “Reinforcement learning for computer vision and robot navigation”, in Proc. International Conference on Machine Learning and Data Mining in Pattern Recognition, 2018, pp. 258-272: Springer.
- V. Matt and N. Aran, “Deep reinforcement learning approach to autonomous driving”, ed: arXiv, 2017.
- X. Da and J. Grizzle, “Combining trajectory optimization, supervised machine learning, and model structure for mitigating the curse of dimensionality in the control of bipedal robots”, Int. J. Rob. Res., vol. 38, no. 9, pp. 1063–1097, 2019.
- I. Zamora, N. G. Lopez, V. M. Vilches, and A. H. Cordero, “Extending the openai gym for robotics: A toolkit for reinforcement learning using ros and gazebo”, arXiv preprint arXiv:1608.05742, 2016.
- H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning”, The International Journal of Robotics Research, vol. 35, no. 11, pp. 1289-1307, 2016.
- L. Tai and M. Liu, “A robot exploration strategy based on qlearning network”, in Proc. 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2016, pp. 57-62.
- L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation”, in Proc. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 31-36.
- Mnih. V, Kavukcuoglu. K, Silver. D, Rusu. A.A, Veness. J, Bellemare. M.G, Graves. A, Riedmiller. M, Fidjeland. A.K, Ostrovski. G, et al, “Human-level control through deep reinforcement learning”, Nature 2015, pp. 518-529.
- Van Hasselt. H, Guez. A, Silver. D, “Deep Reinforcement Learning with Double Q-Learning”, AAAI: Phoenix, AZ, USA, 2016; Volume 2, p. 5.
- Wang. Z, Schaul. T, Hessel. M, Van Hasselt. H, Lanctot. M, De Freitas. N, “Dueling network architectures for deep reinforcement learning” arXiv 2015 arXiv:1511.06581. Avaliable online: https://arxiv.org/pdf/1511. 06581.pdf (accessed on 12 September 2018).
- Diederik P, Kingma and Jimmy Ba, “Adam: A method for stochastic optimization”, CoRR, abs/1412.6980, 2015.
- Dr.V.V.Narendra Kumar, T.Satish Kumar, "Smarter Artificial Intelligence with Deep Learning" SSRG International Journal of Computer Science and Engineering Vol-5,Iss-6,2018. A
- Oudeyer, P.Y. Computational theories of curiosity-driven learning. arXiv:1802.10546 (2018).
- Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55(4), 189–208 (1948).
- Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A.J., Deil, M., Goroshin, R., Sifre,L., Kavukcuoglu, K., Kumaran, D., & Hadsell, R. Learning to navigate in complex environments. arXiv:1611.03673 (2017).
- LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444 (2015).
- Oh, J., Chockalingam, V., Singh, S. P., & Lee, H. Control of memory, active perception, and action in Minecraft. arXiv:1605.09128 (2016).
- Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning, in 2017 IEEE International Conference on Robotics and Automation (ICRA) 3357–3364 (2016).
- Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Sliver, D., & Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. arXiv:1602.01783 (2016).
- Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000).
- Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Sliver, D., & Kavukcuoglu, K. Reinforcement learning with unsupervised auxiliary tasks. arXiv:1611.05397 (2016).
- Ye, X., Lin, Z., Li. H., Zheng, S., & Yang, Y. Active object perceiver: Recognition-guided policy learning for object searching on mobile robots. arXiv:1807.11174v1 (2018).
- Yang, W., Wang, X., Farhadi, A., Gupta, A., & Mottaghi, R. Visual semantic navigation using scene priors. arXiv:1810.06543 (2018).
- Devo, A., Mezzetti, G., Costante, G., Fravolini, M. L. & Valigi, P. Towards generalization in target-driven visual navigation by using deep reinforcement learning. IEEE Trans. Robot. 36(5), 1546–1561 (2020).
- Berlyne, D. E. Conflict, Arousal and Curiosity 38–54 (McGraw-Hill Book Company, 1960).
- Harlow, F. H. Learning and satiation of response in intrinsically motivated complex puzzle performances by monkeys. J. Comp. Physiol. Psychol. 43, 289–294 (1950).
- Sylva, K., Bruner, J. S., & Jolly, A. Play: Its role in development and evolution 279–292 (Penguin Books Ltd, 2017).
- Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. Unifying count-based exploration and intrinsic motivation, in NIPS (2016).
- Ostrovski, G., Bellemare, M.G., Oord, A. V. D., & Munos, R. Count-based exploration with neural density models. arXiv:1703.01310 (2017).
- Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X., Duan, Y., Schulman, J., Turck, F. D., & Abbeel, P. Exploration: A study of count-based exploration for deep reinforcement learning, in NIPS (2017).
- Houthooft, R., Chen, X., Duan, Y., Schulman, J., Turck, F. D., & Abbeel, P. Vime: Variational information maximizing exploration, in NIPS (2016).
- Fu, J., Co-Reyes, J. D., & Levine, S.: EX2: Exploration with exemplar models for deep reinforcement learning, in NIPS (2017).
- Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. Curiosity-driven exploration by self-supervised prediction. arXiv:1705.05363 (2017).
- Pritzel, A., Uria, B., Srinivasan, S., Puigdomenech, A., Vinyals, O., Hassabis, D., Wierstra, D., & Blundell, C. Neural episode control. arXiv:1703.01988 (2017).
- Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., & Levine, S. Time-contrastive network: Self-supervised learning from video. arXiv:1704.06888 (2018).
- Aytar, Y., Pfaff, T., Budden, D., Paine, T. L., & Wang, Z. Playing hard exploration games by watching youtube. arXiv:1805.11592 (2018).
- Cadena, C. et al. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016).
- Bhatti, S., Desmaison, A., Miksikm, O., Nardelli, N., Siddharth, N., & Torr, P. H. S. Playing doom with SLAM-augmented deep reinforcement learning. arXiv:1612.00380 (2016).
- Parisotto, E., & Salakhutdinov, R. Neural map: Structured memory for deep reinforcement learning. arXiv:1702.08360 (2017).
- Gupta, S., Tolani, V., Davidson, J., Levine, S., Sukthankar, R., & Malik, J. Cognitive mapping and planning for visual navigation. arXiv:1702.3920 (2019).
- Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992).
- Nachum, O., Norouzi, M., Xu, K., & Schuurmans, D. Bridging the gap between value and policy based reinforcement learning. arXiv:1702.08892 (2017).
- Sutton, R. S., & Barto, A. G. Reinforcement learning: An introduction 215–260 (The MIT Press, 1998).
- He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. arXiv:1512.03385 (2015).
- Friston, K., Fitzgerald, T., Rigoli, F., Schwartenbeck, P. & Pezzulo, G. Active inference: A process theory. Neural Comput. 29(1), 1–49 (2017).
- Forestier, S., & Oudeyer, P. Y. Modular active curiosity-driven discovery of tool use, in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 3965–3972 (2016).
- Salge, C., Glackin, C. & Polani, D. Changing the environment based on empowerment as intrinsic motivation. Entropy 16(5), 2789–2819 (2014).
- Little, D. Y. & Sommer, F. T. Learning and exploration in action–perception loops. Front. Neural Circuits 7(37), 1–19 (2013).
- Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in Proceedings of the Seventh International Conference on Machine Learning 226–224 (1995).
- Sigaud, O., & Stulp, F. Policy search in continuous action domains: An overview. arXiv:1803.04706 (2018).
- Moser, E. I., Kropff, E. & Moser, M. B. Place cells, grid cells, and the brain’s spatial representation system. Annu. Rev. Neurosci. 31, 69–89 (2008).
- Kirichuk, V. S., Kosykh, V. P., Popov, S. A. & Shchikov, V. S. Suppression of a quasi-stationary background in a sequence of images by means of interframe processing. Optoelectron. Instrument. Data Process. 50(2), 109–117 (2014).
- LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
- Cormen, T. H., Leiserson C. E., Rivest, R. L., & Stein, C. Introduction to Algorithms, 3rd ed, 658–664, 682 (The MIT Press, 2005).
- Beattie, C., Leibo, J.Z., Teplyashin, D., Ward, T., Wainwright, M., Kuttler, H., Lefrancq, A., Green, S., Valdes, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Caffney, S., King, H., Hassabis, D., Legg, S., & Petersen, S. Deepmind lab. arXiv:1612.03801 (2016).
- Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. Trust region policy optimization. arXiv:1502.05477 (2017).
- Hausknecht, M., & Stone, P. Deep recurrent Q-learning for partially observable MDPs. arXiv:1507.06527 (2017).
- Kingma, D. P., & Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 (2017).
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.