Evaluation of the Effectiveness of Feature Selection Methods Combined with Regression Algorithms to Predict Particulate Matter (PM10) in Gandhinagar, Gujarat, India

Authors

  • Zalak L. Thakker Bhagwan Mahavir Centre for Advance Research, Bhagwan Mahavir University, Surat, Gujarat, India Author
  • Dr. Sanjay H. Buch Bhagwan Mahavir College of Computer Application, Bhagwan Mahavir University, Surat, Gujarat, India Author

DOI:

https://doi.org/10.32628/CSEIT2390641

Keywords:

Feature Selection, Particulate Matter, Machine learning, Regression algorithm, Air Quality, Artificial Neural Network, Decision Tree

Abstract

Feature selection is one of the important data pre-processing techniques that are used to increase the performance of machine learning models, to build faster and more cost-effective algorithms, and to make it easier to interpret the predictions made by the models. The main objective of this research work is to investigate the influence features to predict particulate matter (PM10). This research uses 24-hour average pollutant concentration data of 36 air quality monitoring stations provided by Gandhinagar Smart City Development Limited (GSCDL), Gandhinagar, Gujarat. Important features were identified using five feature selection techniques (correlation, forward selection, backward elimination, Exhaustive Feature Selection (EFS), and feature importance derived using Random Forest Regressor). With selected features six regression algorithms (Multiple Linear Regression, Random Forest, Decision Tree, K-nearest Neighbour, XGBoost, and Support Vector Regressor) were trained to predict PM10. Further, the models were compared based on the Root Mean Square Error (RMSE) and Coefficient of determination (R2) parameters to identify the model with good performance. This proposed model can be utilized as an early warning system, providing air quality information to local authorities to develop air-quality improvement initiatives.

Downloads

Download data is not yet available.

References

Sarabu, Vijay. (2022). INDIA A DEVELOPED COUNTRY?. 10.13140/RG.2.2.15787.72487.

Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S

(2019) PM2.5 Prediction based on random forest, XGBoost, and

deep learning using multisource remote sensing data. Atmosphere 10(7):373 DOI: https://doi.org/10.3390/atmos10070373

Malalgoda C, Amaratunga D, Haigh R (2016) Local governments and

disaster risk reduction: a conceptual framework. Massey University/The University of Auckland, Auckland

Manisalidis I, Stavropoulou E, Stavropoulos A and Bezirtzoglou E (2020) Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 8:14. doi: 10.3389/fpubh.2020.00014 DOI: https://doi.org/10.3389/fpubh.2020.00014

Pandey, Anamika Brauer, Michael Cropper, et al.,” Health and economic impact of air pollution in the states of India: The Global Burden of Disease Study 2019” The Lancet Planetary Health the Lancet Planetary Health volume 5, pages25-38, 2021 DOI: 10.1016/S2542-5196(20)30298-9 DOI: https://doi.org/10.1016/S2542-5196(20)30298-9

Wilson WE, Suh HH. Fine particles and coarse particles: concentration relationships relevant to epidemiologic studies. J Air Waste Manag Assoc. (1997) 47:1238–49. doi: 10.1080/10473289.1997.10464074 DOI: https://doi.org/10.1080/10473289.1997.10464074

Cheung K, Daher N, Kam W, Shafer MM, Ning Z, Schauer JJ, et al. Spatial and temporal variation of chemical composition and mass closure of ambient coarse particulate matter (PM10–2.5) in the Los Angeles area. Atmos Environ. (2011) 45:2651–62. doi: 10.1016/j.atmosenv.2011.02.066. DOI: https://doi.org/10.1016/j.atmosenv.2011.02.066

Hamanaka RB, Mutlu GM. Particulate Matter Air Pollution: Effects on the Cardiovascular System. Front Endocrinol (Lausanne). 2018 Nov 16;9:680. doi: 10.3389/fendo.2018.00680. PMID: 30505291; PMCID: PMC6250783. DOI: https://doi.org/10.3389/fendo.2018.00680

Zhou, H.; Han, S.; Liu, Y. A novel feature selection approach based on document frequency of segmented term frequency. IEEE Access 2018, 6, 53811–53821. [CrossRef] DOI: https://doi.org/10.1109/ACCESS.2018.2871109

Towards Data Science. An Introduction to Feature Selection. 2020. Available online: https://towardsdatascience.com/anintroduction-to-feature-selection-dd72535ecf2b (accessed on 24 February 2024).

Sukatis, F.F.; Noor, N.M.; Zakaria, N.A.; Ul-Saufie, A.Z.; Suwardi, A. Estimation of missing values In Air Pollution Dataset by Using Various Imputation Methods. Int. J. Conserv. Sci. 2019, 10, 791–804.

Shaziayani, W.N.; Harun, F.D.; Ul-Saufie, A.Z.; Samsudin, N.; Noor, N.M. Three-Days Ahead Prediction of Daily Maximum Concentrations of PM10 Using Decision Tree Approach. Int. J. Conserv. Sci. 2021, 12, 217–224.

Libasin, Z.; Suhailah, W.; Fauzi, W.M.; Ul-Saufie, A.Z.; Idris, N.A.; Mazeni, N.A. Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia. Pertanika J. Sci. Technol. 2021, 29, 3099–3112. [CrossRef] DOI: https://doi.org/10.47836/pjst.29.4.46

Ahmad, Zia, Ul-Saufie., Nurul, Haziqah, Hamzan., Zulaika, Zahari., Wan, Nur, Shaziayani., Norazian, Mohammed, Noor., Mohd, Remy, Rozainy, Mohd, Arif, Zainol., Andrei, Victor, Sandu., Gy., Deák., Petrica, Vizureanu. (2022). Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection. Sustainability, 14(18):11403-11403. doi: 10.3390/su141811403 DOI: https://doi.org/10.3390/su141811403

Tina, Čok. (2022). Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification. doi: 10.1007/978-981-19-3089-8_47 DOI: https://doi.org/10.1007/978-981-19-3089-8_47

(2022). Feature Selection and Classification – A Probabilistic Wrapper Approach. doi: 10.1201/9780429332111-72 DOI: https://doi.org/10.1201/9780429332111-72

Ani, Dijah, Rahajoe. (2019). Forecasting Feature Selection based on Single Exponential Smoothing using Wrapper Method. International Journal of Advanced Computer Science and Applications, doi: 10.14569/IJACSA.2019.0100620 DOI: https://doi.org/10.14569/IJACSA.2019.0100620

Oumaima, Bouakline., Y., El, Merabet., Kenza, Khomsi. (2022). Deep-Learning models for daily PM10 forecasts using feature selection and genetic algorithm. doi: 10.1109/ICOA55659.2022.9934503 DOI: https://doi.org/10.1109/ICOA55659.2022.9934503

Alisha, Banga., Ravinder, Ahuja., Subhash, Chander, Sharma. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of Systems Assurance Engineering and Management, 1-14. doi: 10.1007/S13198-020-01049-9 DOI: https://doi.org/10.1007/s13198-020-01049-9

Sabyasachi, Mukherjee. (2022). Ensemble Method of Feature Selection Using Filter and Wrapper Techniques with Evolutionary Learning. 745-755. doi: 10.1007/978-981-19-4052-1_73. DOI: https://doi.org/10.1007/978-981-19-4052-1_73

Stefan, Schmainta. (2023). Correlated Features in Air Pollution Prediction. 527-536. doi: 10.1007/978-981-19-7041-2_44 DOI: https://doi.org/10.1007/978-981-19-7041-2_44

Luca, Mesin., Fiammetta, Orione., Riccardo, Taormina., Eros, Pasero. (2010). A feature selection method for air quality forecasting. 6354:489-494. doi: 10.1007/978-3-642-15825-4_66 DOI: https://doi.org/10.1007/978-3-642-15825-4_66

Soledad Galli (2022). Feature Selection in Machine Learning with Python, Leanpub

Banga, Alisha & Ahuja, Ravinder & Sharma, Subhash. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of System Assurance Engineering and Management. 14. 10.1007/s13198-020-01049-9 DOI: https://doi.org/10.1007/s13198-020-01049-9

Meghanathan, Natarajan. (2016). Assortativity Analysis of Real-World Network Graphs based on Centrality Metrics. Computer and Information Science. 9. 7. 10.5539/cis.v9n3p7. DOI: https://doi.org/10.5539/cis.v9n3p7

Downloads

Published

14-03-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 385

You may also start an advanced similarity search for this article.