Evaluation of the Effectiveness of Feature Selection Methods Combined with Regression Algorithms to Predict Particulate Matter (PM10) in Gandhinagar, Gujarat, India
DOI:
https://doi.org/10.32628/CSEIT2390641Keywords:
Feature Selection, Particulate Matter, Machine learning, Regression algorithm, Air Quality, Artificial Neural Network, Decision TreeAbstract
Feature selection is one of the important data pre-processing techniques that are used to increase the performance of machine learning models, to build faster and more cost-effective algorithms, and to make it easier to interpret the predictions made by the models. The main objective of this research work is to investigate the influence features to predict particulate matter (PM10). This research uses 24-hour average pollutant concentration data of 36 air quality monitoring stations provided by Gandhinagar Smart City Development Limited (GSCDL), Gandhinagar, Gujarat. Important features were identified using five feature selection techniques (correlation, forward selection, backward elimination, Exhaustive Feature Selection (EFS), and feature importance derived using Random Forest Regressor). With selected features six regression algorithms (Multiple Linear Regression, Random Forest, Decision Tree, K-nearest Neighbour, XGBoost, and Support Vector Regressor) were trained to predict PM10. Further, the models were compared based on the Root Mean Square Error (RMSE) and Coefficient of determination (R2) parameters to identify the model with good performance. This proposed model can be utilized as an early warning system, providing air quality information to local authorities to develop air-quality improvement initiatives.
Downloads
References
Sarabu, Vijay. (2022). INDIA A DEVELOPED COUNTRY?. 10.13140/RG.2.2.15787.72487.
Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S
(2019) PM2.5 Prediction based on random forest, XGBoost, and
deep learning using multisource remote sensing data. Atmosphere 10(7):373 DOI: https://doi.org/10.3390/atmos10070373
Malalgoda C, Amaratunga D, Haigh R (2016) Local governments and
disaster risk reduction: a conceptual framework. Massey University/The University of Auckland, Auckland
Manisalidis I, Stavropoulou E, Stavropoulos A and Bezirtzoglou E (2020) Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 8:14. doi: 10.3389/fpubh.2020.00014 DOI: https://doi.org/10.3389/fpubh.2020.00014
Pandey, Anamika Brauer, Michael Cropper, et al.,” Health and economic impact of air pollution in the states of India: The Global Burden of Disease Study 2019” The Lancet Planetary Health the Lancet Planetary Health volume 5, pages25-38, 2021 DOI: 10.1016/S2542-5196(20)30298-9 DOI: https://doi.org/10.1016/S2542-5196(20)30298-9
Wilson WE, Suh HH. Fine particles and coarse particles: concentration relationships relevant to epidemiologic studies. J Air Waste Manag Assoc. (1997) 47:1238–49. doi: 10.1080/10473289.1997.10464074 DOI: https://doi.org/10.1080/10473289.1997.10464074
Cheung K, Daher N, Kam W, Shafer MM, Ning Z, Schauer JJ, et al. Spatial and temporal variation of chemical composition and mass closure of ambient coarse particulate matter (PM10–2.5) in the Los Angeles area. Atmos Environ. (2011) 45:2651–62. doi: 10.1016/j.atmosenv.2011.02.066. DOI: https://doi.org/10.1016/j.atmosenv.2011.02.066
Hamanaka RB, Mutlu GM. Particulate Matter Air Pollution: Effects on the Cardiovascular System. Front Endocrinol (Lausanne). 2018 Nov 16;9:680. doi: 10.3389/fendo.2018.00680. PMID: 30505291; PMCID: PMC6250783. DOI: https://doi.org/10.3389/fendo.2018.00680
Zhou, H.; Han, S.; Liu, Y. A novel feature selection approach based on document frequency of segmented term frequency. IEEE Access 2018, 6, 53811–53821. [CrossRef] DOI: https://doi.org/10.1109/ACCESS.2018.2871109
Towards Data Science. An Introduction to Feature Selection. 2020. Available online: https://towardsdatascience.com/anintroduction-to-feature-selection-dd72535ecf2b (accessed on 24 February 2024).
Sukatis, F.F.; Noor, N.M.; Zakaria, N.A.; Ul-Saufie, A.Z.; Suwardi, A. Estimation of missing values In Air Pollution Dataset by Using Various Imputation Methods. Int. J. Conserv. Sci. 2019, 10, 791–804.
Shaziayani, W.N.; Harun, F.D.; Ul-Saufie, A.Z.; Samsudin, N.; Noor, N.M. Three-Days Ahead Prediction of Daily Maximum Concentrations of PM10 Using Decision Tree Approach. Int. J. Conserv. Sci. 2021, 12, 217–224.
Libasin, Z.; Suhailah, W.; Fauzi, W.M.; Ul-Saufie, A.Z.; Idris, N.A.; Mazeni, N.A. Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia. Pertanika J. Sci. Technol. 2021, 29, 3099–3112. [CrossRef] DOI: https://doi.org/10.47836/pjst.29.4.46
Ahmad, Zia, Ul-Saufie., Nurul, Haziqah, Hamzan., Zulaika, Zahari., Wan, Nur, Shaziayani., Norazian, Mohammed, Noor., Mohd, Remy, Rozainy, Mohd, Arif, Zainol., Andrei, Victor, Sandu., Gy., Deák., Petrica, Vizureanu. (2022). Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection. Sustainability, 14(18):11403-11403. doi: 10.3390/su141811403 DOI: https://doi.org/10.3390/su141811403
Tina, Čok. (2022). Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification. doi: 10.1007/978-981-19-3089-8_47 DOI: https://doi.org/10.1007/978-981-19-3089-8_47
(2022). Feature Selection and Classification – A Probabilistic Wrapper Approach. doi: 10.1201/9780429332111-72 DOI: https://doi.org/10.1201/9780429332111-72
Ani, Dijah, Rahajoe. (2019). Forecasting Feature Selection based on Single Exponential Smoothing using Wrapper Method. International Journal of Advanced Computer Science and Applications, doi: 10.14569/IJACSA.2019.0100620 DOI: https://doi.org/10.14569/IJACSA.2019.0100620
Oumaima, Bouakline., Y., El, Merabet., Kenza, Khomsi. (2022). Deep-Learning models for daily PM10 forecasts using feature selection and genetic algorithm. doi: 10.1109/ICOA55659.2022.9934503 DOI: https://doi.org/10.1109/ICOA55659.2022.9934503
Alisha, Banga., Ravinder, Ahuja., Subhash, Chander, Sharma. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of Systems Assurance Engineering and Management, 1-14. doi: 10.1007/S13198-020-01049-9 DOI: https://doi.org/10.1007/s13198-020-01049-9
Sabyasachi, Mukherjee. (2022). Ensemble Method of Feature Selection Using Filter and Wrapper Techniques with Evolutionary Learning. 745-755. doi: 10.1007/978-981-19-4052-1_73. DOI: https://doi.org/10.1007/978-981-19-4052-1_73
Stefan, Schmainta. (2023). Correlated Features in Air Pollution Prediction. 527-536. doi: 10.1007/978-981-19-7041-2_44 DOI: https://doi.org/10.1007/978-981-19-7041-2_44
Luca, Mesin., Fiammetta, Orione., Riccardo, Taormina., Eros, Pasero. (2010). A feature selection method for air quality forecasting. 6354:489-494. doi: 10.1007/978-3-642-15825-4_66 DOI: https://doi.org/10.1007/978-3-642-15825-4_66
Soledad Galli (2022). Feature Selection in Machine Learning with Python, Leanpub
Banga, Alisha & Ahuja, Ravinder & Sharma, Subhash. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of System Assurance Engineering and Management. 14. 10.1007/s13198-020-01049-9 DOI: https://doi.org/10.1007/s13198-020-01049-9
Meghanathan, Natarajan. (2016). Assortativity Analysis of Real-World Network Graphs based on Centrality Metrics. Computer and Information Science. 9. 7. 10.5539/cis.v9n3p7. DOI: https://doi.org/10.5539/cis.v9n3p7
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Zalak L. Thakker, Dr. Sanjay H. Buch (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.