Reinforcement Learning from AI Feedback A Review

Prof. (Dr) Satya Singh; Ratnesh Kumar Sharma

doi:10.32628/CSEIT24104135

Authors

Prof. (Dr) Satya Singh Department of Computer Science and Applications, M.G. Kashi Vidyapith, Varanasi, Uttar Pradesh, India Author
Ratnesh Kumar Sharma Department of Computer Science and Applications, M.G. Kashi Vidyapith, Varanasi, Uttar Pradesh, India Author

DOI:

https://doi.org/10.32628/CSEIT24104135

Keywords:

Reinforcement Learning, AI Generated Feedback, Machine Learning, Reward Signals, Interactive Learning

Abstract

Reinforcement Learning from AI Feedback (RLAIF) is a big step forward compared to Reinforcement Learning from Human Feedback (RLHF). It's especially useful for large language models like GPT-4. RLAIF is better because it can handle more data at scale and is more efficient. It uses AI-generated feedback instead of human feedback. This shift to AI-generated feedback enhances the efficiency and speed of training AI systems. Additionally, RLAIF optimizes the AI's ability to align with desired outcomes, although it may not directly improve understanding human preferences. RLAIF uses a Preference Model (PM) that follows constitutional principles. This ensures that AI responses are ethical, safe, and high -quality. The constitution sets rules for AI decision-making. It makes sure AI follows ethical and social standards. This is important as AI keeps evolving. RLAIF is moving towards an automated, moral feedback system focusing on responsible AI governance and ethical guidelines.

Downloads

Download data is not yet available.

References

N. Ding et al., "Parameter-efficient fine-tuning of large-scale pre-trained language models," Nature Machine Intelligence, vol. 5, no. 3, pp. 220-235, 2023. DOI: https://doi.org/10.1038/s42256-023-00626-4

O. Muzurura, T. Mzikamwi, T. G. Rebanowako, and D. Mpini, "APPLICATION OF ARTIFICIAL INTELLIGENCE FOR VIRTUAL TEACHING ASSISTANCE (Case study: Introduction to Information Technology)," 2023.

R. Zheng, S. Dou, S. Gao, W. Shen, B. Wang, Y. Liu, et al., "Secrets of rlhf in large language models part i: Ppo," arXiv preprint arXiv:2307.04964, 2023.

Y. Zhao, R. Joshi, T. Liu, M. Khalman, M. Saleh, and P. J. Liu, " SLICHF: Sequence likelihood calibration with human feedback," arXiv preprint arXiv:2305.10425, 2023.

B. Singh, R. Kumar, and V. P. Singh, "Reinforcement learning in robotic applications: a comprehensive survey," Artificial Intelligence Review, pp. 1 -46, 2022.

L. C. Garaffa et al., "Reinforcement learning for mobile robotics exploration: A survey," IEEE Transactions on Neural Networks and Learning Systems, 2021.

J. Lin, Z. Ma, R. Gomez, K. Nakamura, B. He, and G. Li, "A review on interactive reinforcement learning from human social feedback," IEEE Access, vol. 8, pp. 120757 - 120765, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3006254

L. Guan, M. Verma, and S. Kambhampati, "Explanation augmented feedback in human-in-the-loop reinforcement learning," arXiv preprint arXiv:2006.14804, 2020.