Training AI Models: Preparing and Managing AI Algorithms for AIOps

Authors

  • Satyanarayana Murthy Polisetty   Jawaharlal Nehru Technological University, Kakinada India

DOI:

https://doi.org/10.32628/CSEIT2390175

Keywords:

AIOps, Dynamic Model Adjustment, IBM Cloud Pak for AIOps, Kubernetes, Continuous Learning

Abstract

Artificial Intelligence plays a critical role in AIOps by enhancing decision-making and reducing human intervention in IT operations. This paper dives deep into the preparation and management of AI models within the IBM Cloud Pak for AIOps framework. It focuses on how AI algorithms are trained and deployed to address specific challenges like incident detection, anomaly prediction, and service availability optimization. The paper highlights key methodologies for selecting the right AI models, preparing data, and maintaining the algorithms over time. A major section of the article is dedicated to understanding the types of algorithms available, such as natural language log anomaly detection and metric anomaly detection, and how they are fine-tuned for real-time data analysis. Novel methodologies are proposed for managing AI models, including dynamic algorithm adjustments based on operational needs, and scaling models in large, distributed environments. The article suggests a new approach where models continuously learn from incoming data, ensuring the AI remains relevant and adaptive to changing IT environments. By integrating real-time monitoring tools into the model management pipeline, it proposes that IBM Cloud Pak can dynamically allocate resources and adjust algorithm complexity as required, improving operational efficiency.

References

  1. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58. (A foundational survey covering various anomaly detection techniques applicable to metrics and events).
  2. Kephart, J. O., & Chess, D. M. (2003). The vision of autonomic computing. Computer, 36(1), 41-50. (Introduces the concept of self-managing IT systems, a precursor goal of AIOps).
  3. He, P., Zhu, J., He, S., Li, J., & Lyu, M. R. (2017, May). Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS) (pp. 33-40). IEEE. (Presents a specific, influential technique for parsing unstructured logs, essential for log analysis).
  4. Lou, J. G., Fu, Q., Yang, S., Xu, Y., & Li, J. (2010, June). Mining invariants from console logs for system problem detection. In Presented as part of the 2010 USENIX Annual Technical Conference (USENIX ATC 10). (Focuses on extracting normal patterns from logs to detect anomalies).
  5. Lakhina, A., Crovella, M., & Diot, C. (2004, October). Diagnosing network-wide traffic anomalies. In ACM SIGCOMM computer communication review (Vol. 34, No. 4, pp. 219-230). ACM. (An example of early work applying anomaly detection specifically to network metrics).
  6. Hellerstein, J. L., Ma, S., & Perng, C. S. (2002). Discovering actionable patterns in event data. IBM Systems Journal, 41(3), 475-493. (Early work on finding meaningful patterns in sequences of IT events).
  7. Fu, Q., Lou, J. G., Wang, Y., & Li, J. (2009, December). Execution anomaly detection in distributed systems through unstructured log analysis. In 2009 Ninth IEEE International Conference on Data Mining (ICDM) (pp. 149-158). IEEE. (Directly addresses using logs for anomaly detection in distributed systems).
  8. Cohen, I., Goldszmidt, M., Kelly, J., & Symons, J. (2004, May). Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation (OSDI) (pp. 16-16). USENIX Association. (Focuses on linking metrics/instrumentation data to actual system problems).
  9. Xu, W., Huang, L., Fox, A., Patterson, D., & Jordan, M. I. (2009, June). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP) (pp. 117-132). ACM. (Another key paper on using machine learning for log-based problem detection).
  10. Sushil Prabhu Prabhakaran, Satyanarayana Murthy Polisetty,Santhosh Kumar Pendyala. Building a Unified and Scalable Data Ecosystem: AI-DrivenSolution Architecture for Cloud Data Analytics. International Journal of Computer Engineering and Technology (IJCET), 13(3), 2022, pp. 137-153. https://iaeme.com/Home/issue/IJCET?Volume=13&Issue=3
    (PDF) Building a Unified and Scalable Data Ecosystem: AI-Driven Solution Architecture for Cloud- DataAnalytics.https://www.researchgate.net/publication/389906454_Building_a_Unified_and_Scalable_Data_Ecosystem_AI-Driven_Solution_Architecture_for_Cloud_Data_Analytics
  11. Makanju, A. A., Zincir-Heywood, A. N., & Milios, E. E. (2009, June). Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) (pp. 1255-1264). ACM. (Applies clustering techniques to group related log events).
  12. Fnu, Y., Saqib, M., Malhotra, S., Mehta, D., Jangid, J., & Dixit, S. (2021). Thread mitigation in cloud native application Develop- Ment. Webology, 18(6), 10160–10161, https://www.webology.org/abstract.php?id=5338s
  13. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85-126. (A broad survey of outlier detection techniques, many of which form the basis for AIOps anomaly detection).
  14. Jangid, J., Dixit, S., Malhotra, S., Saqib, M., Yashu, F., & Mehta, D. (2023). Enhancing security and efficiency in wireless mobile networks through blockchain. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 958–969, https://ijisae.org/index.php/IJISAE/article/view/7309
  15. Shubham Malhotra, Muhammad Saqib, Dipkumar Mehta, and Hassan Tariq. (2023). Efficient Algorithms for Parallel Dynamic Graph Processing: A Study of Techniques and Applications. International Journal of Communication Networks and Information Security (IJCNIS), 15(2), 519–534. Retrieved from https://ijcnis.org/index.php/ijcnis/article/view/7990
  16. Santhosh Kumar Pendyala, Satyanarayana Murthy Polisetty, Sushil Prabhu Prabhakaran. Advancing Healthcare Interoperability Through Cloud-Based Data Analytics: Implementing FHIR Solutions on AWS. International Journal of Research in Computer Applications and Information Technology (IJRCAIT), 5(1), 2022, pp. 13-20. https://iaeme.com/Home/issue/IJRCAIT?Volume=5&Issue=1
  17. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., ... & Warfield, A. (2003). Xen and the art of virtualization. ACM SIGOPS operating systems review, 37(5), 164-177. (While focused on virtualization, discusses challenges and approaches for performance monitoring in complex environments relevant to AIOps data sources).
  18. Yemini, S. A., Kliger, S., Mozes, E., Yemini, Y., & Ohsie, D. (1996). High speed and robust event correlation. IEEE Communications magazine, 34(5), 82-90. (Much earlier work focused on rule-based event correlation, providing context for the evolution towards AI-based approaches).
  19. Stearley, J. (2004, September). Towards holistic performance management using event logs. In Workshop on Mining Software Repositories (MSR). (Early work considering logs as a source for overall performance understanding).
  20. Urgaonkar, B., Pacifici, G., Shenoy, P., Steinder, M., & Tantawi, A. (2008, June). An analytical model for multi-tier internet services and its applications. In ACM SIGMETRICS performance evaluation review (Vol. 36, No. 1, pp. 291-302). ACM. (Focuses on performance modeling, relevant for understanding system behavior and predicting resource needs – a component of advanced AIOps).

Downloads

Published

2023-10-30

Issue

Section

Research Articles

How to Cite

[1]
Satyanarayana Murthy Polisetty , " Training AI Models: Preparing and Managing AI Algorithms for AIOps" International Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 9, Issue 5, pp.427-441, September-October-2023. Available at doi : https://doi.org/10.32628/CSEIT2390175