Architectures and Optimization Strategies for Real-Time Machine Learning Recommendation Systems: A Systematic Review of Scalability Challenges

Authors

  • Mohit Bharti Arizona State University, USA Author

DOI:

https://doi.org/10.32628/CSEIT25111258

Keywords:

Real-time Recommendation Systems, Machine Learning Infrastructure, Distributed Computing Architecture, Model Serving Optimization, Performance Engineering

Abstract

This article comprehensively analyzes the challenges and solutions in deploying real-time machine learning recommendation systems at scale. The article examines the critical trade-offs between model complexity, inference latency, and system scalability that impact modern recommendation architectures. The article investigates three primary dimensions: infrastructure optimization, model serving strategies, and resource utilization patterns. The article proposes a novel framework for balancing these competing requirements through a combination of distributed computing architectures, hybrid model deployment approaches, and intelligent caching mechanisms. The findings demonstrate that implementing a multi-tiered serving architecture with dynamic resource allocation significantly improves system performance while maintaining recommendation quality. The article also explores emerging optimization techniques, including model quantization, feature store architectures, and adaptive serving strategies. The article contributes to the field by providing a systematic approach to designing and implementing real-time recommendation systems that can effectively handle high-concurrency workloads while delivering personalized suggestions within strict latency constraints. The results offer valuable insights for practitioners and researchers working on large-scale recommendation systems, particularly in environments where real-time performance is crucial.

Downloads

Download data is not yet available.

References

Sinha, B. B., & Dhanalakshmi, R. (2019). “Evolution of the recommender system over the time.” Soft Computing, 23, 12169-12188. https://link.springer.com/article/10.1007/s00500-019-04143-8

Karlsson, J. (2023).” What it takes to build a real-time recommendation system.” Tinybird. Retrieved from https://www.tinybird.co/blog-posts/real-time-recommendation-system

Zhang, M., Ranjan, R., Menzel, M., Nepal, S., Strazdins, P., & Jie, W. (2017). “An infrastructure service recommendation system for cloud applications with real-time QoS requirement constraints.” IEEE Systems Journal, 11(4), 2960-2970. https://repository.uwl.ac.uk/id/eprint/1731/

Hossain, R. R., & Kumar, R. (2023). "Machine learning accelerated real-time model predictive control for power systems." IEEE/CAA Journal of Automatica Sinica, 10(4), 916-930. https://www.ieee-jas.net/en/article/doi/10.1109/JAS.2023.123135

Tang, J., Liu, G., & Pan, Q. T. (2021). "A review on representative swarm intelligence algorithms for solving optimization problems: Applications and trends." IEEE/CAA Journal of Automatica Sinica, 8(10), 1627-1643. https://www.ieee-jas.net/article/doi/10.1109/JAS.2021.1004129?pageType=en

von der Brüggen, G., Burns, A., Chen, J. J., Davis, R. I., & Reineke, J. (2022). "On the Trade-offs between Generalization and Specialization in Real-Time Systems." IEEE 28th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). https://ieeexplore.ieee.org/abstract/document/9904786

Toussaint, W., & Ding, A. Y. (2020). "Machine Learning Systems in the IoT: Trustworthiness Trade-offs for Edge Intelligence." IEEE Second International Conference on Cognitive Machine Intelligence (CogMI). https://ieeexplore.ieee.org/abstract/document/9319287

Behnam, P., & Bojnordi, M. N. (2020). "RedCache: Reduced DRAM Caching." In 2020 57th ACM/IEEE Design Automation Conference (DAC) (pp. 1-9). IEEE. https://ieeexplore.ieee.org/document/9218658

Gupta, J., Kant, K., & Abouelwafa, A. (2020). "FussyCache: A Caching Mechanism for Emerging Storage Hierarchies." In 2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 1-9). IEEE. https://ieeexplore.ieee.org/abstract/document/9407317

Motlagh, N. H., Lovén, L., Cao, J., Liu, X., Nurmi, P., & Dustdar, S. (2022). "Edge Computing: The Computing Infrastructure for the Smart Megacities of the Future." IEEE Journals & Magazine. https://ieeexplore.ieee.org/abstract/document/9963616

Oh, C., & Yoon, J. (2019). "Hardware Acceleration Technology for Deep-Learning in Edge Computing." IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/8679433

Downloads

Published

19-01-2025

Issue

Section

Research Articles

How to Cite

Architectures and Optimization Strategies for Real-Time Machine Learning Recommendation Systems: A Systematic Review of Scalability Challenges. (2025). International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 11(1), 797-808. https://doi.org/10.32628/CSEIT25111258