Predicting Hospital Stay Length Using Explainable Machine Learning

Authors

  • Jaheerbashashaik M.C.A Student, Department of M.C.A, KMMIPS, Tirupati (D.t), Andhra Pradesh, India Author
  • Ananthnath GVS Assistant Professor, Department of M.C.A, KMMIPS, Tirupati (D.t), Andhra Pradesh, India Author

Keywords:

Hospital Stay Length, Machine Learning, Logistic Regression, MLP, Random Forest, Gradient Boosting, XGBoost, Explain ability, SHAP, Healthcare Analytics

Abstract

Predicting how long patients will stay in the hospital is essential for better managing healthcare resources and improving patient care. This study dives into how we can use explainable machine learning techniques to estimate hospital stay durations, drawing from a dataset on Kaggle that includes various patient and hospital-related features. The main aim here is to create accurate predictive models while also shedding light on the factors that affect how long patients stay. We dive into a variety of machine learning algorithms, including Logistic Regression, Multi-Layer Perceptron, Random Forest, Gradient Boosting, and XGBoost.To assess how well each model performs, we use standard metrics like accuracy, precision, recall, and F1-score. On top of that, we employ explain ability tools like SHapley Additive exPlanations (SHAP) to help interpret the predictions and pinpoint the most important factors influencing the span of halt.

Downloads

Download data is not yet available.

References

Chicco, D., & Jurman, G. (2020). "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone." BMC Medical Informatics and Decision Making, 20(1), 16.

Kumar, R., & Indrayan, A. (2011). "Receiver operating characteristic (ROC) curve for medical researchers." Indian Pediatrics, 48(4), 277-287.

Nguyen, D. M., & Bai, L. (2020). "A novel k-nearest neighbors algorithm for text categorization." Expert Systems with Applications, 45, 259-272.

Liaw, A., & Wiener, M. (2002). "Classification and regression by randomForest." R news, 2(3), 18-22

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research, 16, 321-357.

Dietterich, T. G. (2000). "Ensemble methods in machine learning." International Workshop on Multiple Classifier Systems, 1-15. Springer, Berlin, Heidelberg.

Breiman, L. (1996). "Bagging predictors." Machine Learning, 24(2), 123-140.

Chen, T., & Guestrin, C. (2016). "XGBoost: A scalable tree boosting system." In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794)

Cortes, C., & Vapnik, V. (1995). "Support-vector networks." Machine Learning, 20(3), 273-297.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). "Scikit-learn: Machine Learning in Python." Journal of Machine Learning Research, 12, 2825-2830.

Downloads

Published

04-05-2025

Issue

Section

Research Articles