Bias and Its Consequences : A Study of Machine Learning Performance

Authors

  • Anirudh Kokate BTech CSE, D.Y. Patil International University, Pune, Maharashtra, India Author
  • Madhu Priya BTech CSE, D.Y. Patil International University, Pune, Maharashtra, India Author

DOI:

https://doi.org/10.32628/CSEIT241051088

Keywords:

Machine Learning, Fairness in AI, Class imbalance and Bias Impact Analysis

Abstract

This paper addresses the concern about bias affecting the results of machine learning models. For this purpose, it uses the Adult Income dataset from OpenML for income classification. The conditions for bias are induced by underrepresenting people that earn <= $50K in training data, thus checking the behavior of different models when encountering such a skewed distribution. Key metrics, namely accuracy and specificity (True Negative Rate), were analyzed for unbiased and biased training scenarios. The results show that Naive Bayes and Random Forest models were resistant to bias, but others, including SVM and Logistic Regression, suffered major performance drops. This study throws light on the robustness of different classifiers when exposed to biased data, requiring further bias mitigation strategies in real-world applications. This paper actually examines critically how bias in training data can significantly affect the performance of prediction, fairness, and model selection in income classification tasks.

Downloads

Download data is not yet available.

References

Vega-Gonzalo M, Christidis P. Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling. Sustainability. 2022; 14(14):8416. https://doi.org/10.3390/su14148416 DOI: https://doi.org/10.3390/su14148416

Siddique S, Haque MA, George R, Gupta KD, Gupta D, Faruk MJH. Survey on Machine Learning Biases and Mitigation Techniques. Digital. 2024; 4(1):1-68. https://doi.org/10.3390/digital4010001 DOI: https://doi.org/10.3390/digital4010001

G. Khandelwal, B. Nemade, N. Badhe, D. Mali, K. Gaikwad, and N. Ansari, "Designing and Developing novel methods for Enhancing the Accuracy of Water Quality Prediction for Aquaponic Farming," Advances in Nonlinear Variational Inequalities, vol. 27, no. 3, pp. 302-316, Aug. 2024, ISSN: 1092-910X. DOI: https://doi.org/10.52783/anvi.v27.1375

B. Nemade, S. S. Alegavi, N. B. Badhe, and A. Desai, “Enhancing information security in multimedia streams through logic learning machine assisted moth-flame optimization,” ICTACT Journal of Communication Technology, vol. 14, no. 3, 2023. DOI: https://doi.org/10.21917/ijct.2023.0449

S. S. Alegavi, B. Nemade, V. Bharadi, S. Gupta, V. Singh, and A. Belge, “Revolutionizing Healthcare through Health Monitoring Applications with Wearable Biomedical Devices,” International Journal of Recent Innovations and Trends in Computing and Communication, vol. 11, no. 9s, pp. 752–766, 2023. [Online]. Available: https://doi.org/10.17762/ijritcc.v11i9s.7890. DOI: https://doi.org/10.17762/ijritcc.v11i9s.7890

Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014. https://www.openml.org/d/1590 DOI: https://doi.org/10.1145/2641190.2641198

Pagano TP, Loureiro RB, Lisboa FVN, Peixoto RM, Guimarães GAS, Cruz GOR, Araujo MM, Santos LL, Cruz MAS, Oliveira ELS, et al. “Bias and Unfairness in Machine Learning Models: A Systematic Review on Datasets, Tools, Fairness Metrics, and Identification and Mitigation Methods.”, Big Data and Cognitive Computing. 2023; 7(1):15. https://doi.org/10.3390/bdcc7010015 DOI: https://doi.org/10.3390/bdcc7010015

Gabe Barcelos, “Understanding Bias in Machine Learning Models”, arize.com, (Mar 15 2022). https://arize.com/blog/understanding-bias-in-ml-models/

Reinier H. Stribos, “The Impact of Data Noise on a Naive Bayes Classifier”, (Jan 29 2021). https://essay.utwente.nl/85678/

Tiago Palma Pagano, Rafael Bessa Loureiro, Fernanda Vitória Nascimento Lisboa, Gustavo Oliveira Ramos Cruz, Rodrigo Matos Peixoto, Guilherme Aragão de Sousa Guimarães, Lucas Lisboa dos Santos, Maira Matos Araujo, Marco Cruz, Ewerton Lopes Silva de Oliveira, Ingrid Winkler, Erick Giovani Sperandio Nascimento, “Bias and unfairness in machine learning models: a systematic literature review”, (2022). https://arxiv.org/abs/2202.08176

Alelyani S, “Detection and Evaluation of Machine Learning Bias.”, Applied Sciences. 2021; 11(14):6271. https://doi.org/10.3390/app11146271 DOI: https://doi.org/10.3390/app11146271

B. Nemade, N. Phadnis, A. Desai, and K. K. Mungekar, "Enhancing connectivity and intelligence through embedded Internet of Things devices," ICTACT Journal on Microelectronics, vol. 9, no. 4, pp. 1670-1674, Jan. 2024, doi: 10.21917/ijme.2024.0289.

B. C. Surve, B. Nemade, and V. Kaul, "Nano-electronic devices with machine learning capabilities," ICTACT Journal on Microelectronics, vol. 9, no. 3, pp. 1601-1606, Oct. 2023, doi: 10.21917/ijme.2023.0277.

Ansari Danish, “Exploring the Impact of Bias in Machine Learning: Causes, Consequences, and Potential Solutions”, LinkedIn, (May 16 2023). https://www.linkedin.com/pulse/exploring-impact-bias-machine-learning-causes-potential-ansari-danish

Cox, D. R., "The Regression Analysis of Binary Sequences," Journal of the Royal Statistical Society: Series B (Methodological), (1960). https://www.jstor.org/stable/2983890

Breiman, L., "Random Forests," Machine Learning, (October 01, 2001). https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324

Friedman, J. H., "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics, (October 2001). https://doi.org/10.1214/aos/1013203451 DOI: https://doi.org/10.1214/aos/1013203451

Chen, T., Guestrin, C., "XGBoost: A Scalable Tree Boosting System," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (August 2016). https://doi.org/10.1145/2939672.2939785 DOI: https://doi.org/10.1145/2939672.2939785

Cover, T. M., Hart, P. E., "Nearest Neighbor Pattern Classification," IEEE Transactions on Information Theory, (January 1967). https://doi.org/10.1109/TIT.1967.1053964 DOI: https://doi.org/10.1109/TIT.1967.1053964

Quinlan, J. R., "Induction of Decision Trees," Machine Learning, (March 1986). https://doi.org/10.1007/BF00116251 DOI: https://doi.org/10.1007/BF00116251

Rumelhart, D. E., Hinton, G. E., Williams, R. J., "Learning Representations by Back-Propagating Errors," Nature, (October 1986). https://doi.org/10.1038/323533a0 DOI: https://doi.org/10.1038/323533a0

Lewis, D. D., "Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval," Proceedings of the 10th European Conference on Machine Learning, (April 1998). https://doi.org/10.1007/BFb0026666 DOI: https://doi.org/10.1007/BFb0026666

Fisher, R. A., "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, (July 1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x DOI: https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Rao, C. R., "The Utilization of Multiple Measurements in Problems of Biological Classification," Journal of the Royal Statistical Society, (1948). https://doi.org/10.2307/2983771 DOI: https://doi.org/10.1111/j.2517-6161.1948.tb00008.x

Bhavesh Kataria, "The Challenges of Utilizing Information Communication Technologies (ICTs) in Agriculture Extension, International Journal of Scientific Research in Science, Engineering and Technology, Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 1, Issue 1, pp.380-384, January-February-2015. Available at : https://doi.org/10.32628/ijsrset1511103 DOI: https://doi.org/10.32628/IJSRSET1511103

Patil, P., Kataria, B., Redkar, V., Banait, A., Shilpa, C., Patil, & Khetani, V. (08 2024). Automated Detection of Tuberculosis Using Deep Learning Algorithms on Chest X-rays. Frontiers in Health Informatics, 13, 218–229. https://healthinformaticsjournal.com/index.php/IJMI/article/view/20

Cortes, C., Vapnik, V., "Support-Vector Networks," Machine Learning, (September 1995). https://doi.org/10.1007/BF00994018 DOI: https://doi.org/10.1007/BF00994018

Downloads

Published

08-11-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 433

You may also start an advanced similarity search for this article.