Enhancing Medicare Fraud Detection through ML: Addressing Class Imbalance with SMOTE-ENN
Keywords:
SMOTE-ENN, XGBoost, AdaBoost, LGBM, Decision Tree, Logistic Regression, Random Forest classifierAbstract
Medicare fraud poses serious challenges, leading to considerable financial losses and damaging the integrity of healthcare systems. Conventional approaches to fraud detection often fall short due to the complex and ever-changing tactics used by fraudsters. This project focuses on improving the detection of Medicare fraud by utilizing machine learning techniques, particularly addressing the problem of class imbalance where fraudulent claims are far fewer than legitimate ones. We are developing a classification system that can differentiate between fraudulent and non-fraudulent Medicare claims using several advanced ML algorithms. These include XGBoost, AdaBoost, LightGBM, Decision Tree, Logistic Regression, and Random Forest classifiers. To tackle the issue of class imbalance, we implement SMOTE-ENN, which helps to balance the dataset and enhances the performance of our models. Our experiments reveal that using SMOTE-ENN significantly boosts the detection rate of fraudulent claims. By evaluating the models on both the imbalanced and balanced datasets, we observe notable improvements in essential metrics such as accuracy, precision, recall, and F1-score. Overall, our findings suggest that integrating SMOTE-ENN with ensemble learning techniques offers a strong method for detecting Medicare fraud effectively.
Downloads
References
Alam, M. S., Rai, P., Tiwari, R. K., Pandey, V., & Hussain, S. (2023). Evaluation of Healthcare Data in ML Model Used in Fraud Detection. Communications in Computer and Information Science, 1822 CCIS, 29–39. https://doi.org/10.1007/978-3-031-37303-9_3
Amponsah, I. A., & Amponsah, I. A. (2024). Pandemic profiteering at a time of crisis: Using python to detect fraud in covid-19 testing and treatment payments. Https://Gsconlinepress.Com/Journals/Gscarr/Sites/Default/Files/GSCARR-2024-0183.Pdf, 19(2), 208–218. https://doi.org/10.30574/GSCARR.2024.19.2.0183
Bauder, R. A., & Khoshgoftaar, T. M. (2020). A study on rare fraud predictions with big Medicare claims fraud data. Intelligent Data Analysis, 24(1), 141–161. https://doi.org/10.3233/IDA-184415
Bounab, R., Guelib, B., & Zarour, K. (2024). A Novel ML Approach For handling Imbalanced Data: Leveraging SMOTE-ENN and XGBoost. PAIS 2024 - Proceedings: 6th International Conference on Pattern Analysis and Intelligent Systems. https://doi.org/10.1109/PAIS62114.2024.10541220
Chirchi, K. E., & Kavya, B. (2024). Unraveling Patterns in Healthcare Fraud through Comprehensive Analysis. Proceedings of the 18th INDIAcom; 2024 11th International Conference on Computing for Sustainable Global Development, INDIACom 2024, 585–591. https://doi.org/10.23919/INDIACOM61295.2024.10498727
Gong, J., Zhang, H., & Du, W. (2020). Research on Integrated Learning Fraud Detection Method Based on Combination Classifier Fusion (THBagging): A Case Study on the Foundational Medical Insurance Dataset. Electronics 2020, Vol. 9, Page 894, 9(6), 894. https://doi.org/10.3390/ELECTRONICS9060894
Hamid, Z., Khalique, F., Mahmood, S., Daud, A., Bukhari, A., & Alshemaimri, B. (2024). Healthcare insurance fraud detection using data mining. BMC Medical Informatics and Decision Making, 24(1), 1–24. https://doi.org/10.1186/S12911-024-02512-4/TABLES/9
Hancock, J. T., & Khoshgoftaar, T. M. (2022). Hyperparameter Tuning for Medicare Fraud Detection in Big Data. SN Computer Science, 3(6), 1–13. https://doi.org/10.1007/S42979-022-01348-X/METRICS
Herland, M., Bauder, R. A., & Khoshgoftaar, T. M. (2020). Approaches for identifying U.S. medicare fraud in provider claims data. Health Care Management Science, 23(1), 2–19. https://doi.org/10.1007/S10729-018-9460-8/TABLES/18
Lekkala, L. R., & Lekkala, L. R. (2023). Importance of ML Models in Healthcare Fraud Detection. Voice of the Publisher, 9(4), 207–215. https://doi.org/10.4236/VP.2023.94017
ML Methods to Detect Medicare Fraud and Abuse in US Healthcare - ProQuest. (n.d.). Retrieved September 25, 2024, from https://www.proquest.com/openview/e78cd6cdc8574f1391176a5c59a4f2e7/1?pq-origsite=gscholar&cbl=18750&diss=y
Matloob, I., Khan, S., ur Rahman, H., & Hussain, F. (2020). Medical Health Benefit Management System for Real-Time Notification of Fraud Using Historical Medical Records. Applied Sciences 2020, Vol. 10, Page 5144, 10(15), 5144. https://doi.org/10.3390/APP10155144
Nabrawi, E., & Alanazi, A. (2023a). Fraud Detection in Healthcare Insurance Claims Using ML. https://doi.org/10.3390/risks11090160
Nabrawi, E., & Alanazi, A. (2023b). Fraud Detection in Healthcare Insurance Claims Using ML. Risks 2023, Vol. 11, Page 160, 11(9), 160. https://doi.org/10.3390/RISKS11090160
Optimizing Efficiency and Accuracy in Medicare and Medicaid Fraud Detection Through Artificial Intelligence and ML - ProQuest. (n.d.). Retrieved September 25, 2024, from https://www.proquest.com/openview/3a2e20814cfe86637a413f896a67e79a/1?pq-origsite=gscholar&cbl=18750&diss=y
Sayem, M. A., Taslima, N., Sidhu, G. S., & Ferry, Dr. J. W. (2024). A QUANTITATIVE ANALYSIS OF HEALTHCARE FRAUD AND UTILIZATION OF AI FOR MITIGATION. International Journal of Business and Management Sciences, 4(07), 13–36. https://doi.org/10.55640/IJBMS-04-07-03
Settipalli, L., & Gangadharan, G. R. (2023). WMTDBC: An unsupervised multivariate analysis model for fraud detection in health insurance claims. Expert Systems with Applications, 215, 119259. https://doi.org/10.1016/J.ESWA.2022.119259
Shekhar, S., Leder-Luis, J., & Akoglu, L. (2023). Unsupervised ML for Explainable Health Care Fraud Detection. https://doi.org/10.3386/W30946
Smita, K., Pranathi, D., Pravalika, D., Supraja, E., & Harika, G. (n.d.). Detection of Fraudulent Medicare Providers using Decision Tree and Logistic Regression.
Yoo, Y., Shin, J., & Kyeong, S. (2023). Medicare Fraud Detection Using Graph Analysis: A Comparative Study of ML and Graph Neural Networks. IEEE Access, 11, 88278–88294. https://doi.org/10.1109/ACCESS.2023.3305962
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.