Evaluation of the Effectiveness of Feature Selection Methods Combined with Regression Algorithms to Predict Particulate Matter (PM10) in Gandhinagar, Gujarat, India


  • Zalak L. Thakker Bhagwan Mahavir Centre for Advance Research, Bhagwan Mahavir University, Surat, Gujarat, India Author
  • Dr. Sanjay H. Buch Bhagwan Mahavir College of Computer Application, Bhagwan Mahavir University, Surat, Gujarat, India Author




Feature Selection, Particulate Matter, Machine learning, Regression algorithm, Air Quality, Artificial Neural Network, Decision Tree


Feature selection is one of the important data pre-processing techniques that are used to increase the performance of machine learning models, to build faster and more cost-effective algorithms, and to make it easier to interpret the predictions made by the models. The main objective of this research work is to investigate the influence features to predict particulate matter (PM10). This research uses 24-hour average pollutant concentration data of 36 air quality monitoring stations provided by Gandhinagar Smart City Development Limited (GSCDL), Gandhinagar, Gujarat. Important features were identified using five feature selection techniques (correlation, forward selection, backward elimination, Exhaustive Feature Selection (EFS), and feature importance derived using Random Forest Regressor). With selected features six regression algorithms (Multiple Linear Regression, Random Forest, Decision Tree, K-nearest Neighbour, XGBoost, and Support Vector Regressor) were trained to predict PM10. Further, the models were compared based on the Root Mean Square Error (RMSE) and Coefficient of determination (R2) parameters to identify the model with good performance. This proposed model can be utilized as an early warning system, providing air quality information to local authorities to develop air-quality improvement initiatives.


Download data is not yet available.


Sarabu, Vijay. (2022). INDIA A DEVELOPED COUNTRY?. 10.13140/RG.2.2.15787.72487.

Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S

(2019) PM2.5 Prediction based on random forest, XGBoost, and

deep learning using multisource remote sensing data. Atmosphere 10(7):373

Malalgoda C, Amaratunga D, Haigh R (2016) Local governments and

disaster risk reduction: a conceptual framework. Massey University/The University of Auckland, Auckland

Manisalidis I, Stavropoulou E, Stavropoulos A and Bezirtzoglou E (2020) Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 8:14. doi: 10.3389/fpubh.2020.00014

Pandey, Anamika Brauer, Michael Cropper, et al.,” Health and economic impact of air pollution in the states of India: The Global Burden of Disease Study 2019” The Lancet Planetary Health the Lancet Planetary Health volume 5, pages25-38, 2021 DOI: 10.1016/S2542-5196(20)30298-9

Wilson WE, Suh HH. Fine particles and coarse particles: concentration relationships relevant to epidemiologic studies. J Air Waste Manag Assoc. (1997) 47:1238–49. doi: 10.1080/10473289.1997.10464074

Cheung K, Daher N, Kam W, Shafer MM, Ning Z, Schauer JJ, et al. Spatial and temporal variation of chemical composition and mass closure of ambient coarse particulate matter (PM10–2.5) in the Los Angeles area. Atmos Environ. (2011) 45:2651–62. doi: 10.1016/j.atmosenv.2011.02.066.

Hamanaka RB, Mutlu GM. Particulate Matter Air Pollution: Effects on the Cardiovascular System. Front Endocrinol (Lausanne). 2018 Nov 16;9:680. doi: 10.3389/fendo.2018.00680. PMID: 30505291; PMCID: PMC6250783.

Zhou, H.; Han, S.; Liu, Y. A novel feature selection approach based on document frequency of segmented term frequency. IEEE Access 2018, 6, 53811–53821. [CrossRef]

Towards Data Science. An Introduction to Feature Selection. 2020. Available online: https://towardsdatascience.com/anintroduction-to-feature-selection-dd72535ecf2b (accessed on 24 February 2024).

Sukatis, F.F.; Noor, N.M.; Zakaria, N.A.; Ul-Saufie, A.Z.; Suwardi, A. Estimation of missing values In Air Pollution Dataset by Using Various Imputation Methods. Int. J. Conserv. Sci. 2019, 10, 791–804.

Shaziayani, W.N.; Harun, F.D.; Ul-Saufie, A.Z.; Samsudin, N.; Noor, N.M. Three-Days Ahead Prediction of Daily Maximum Concentrations of PM10 Using Decision Tree Approach. Int. J. Conserv. Sci. 2021, 12, 217–224.

Libasin, Z.; Suhailah, W.; Fauzi, W.M.; Ul-Saufie, A.Z.; Idris, N.A.; Mazeni, N.A. Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia. Pertanika J. Sci. Technol. 2021, 29, 3099–3112. [CrossRef]

Ahmad, Zia, Ul-Saufie., Nurul, Haziqah, Hamzan., Zulaika, Zahari., Wan, Nur, Shaziayani., Norazian, Mohammed, Noor., Mohd, Remy, Rozainy, Mohd, Arif, Zainol., Andrei, Victor, Sandu., Gy., Deák., Petrica, Vizureanu. (2022). Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection. Sustainability, 14(18):11403-11403. doi: 10.3390/su141811403

Tina, Čok. (2022). Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification. doi: 10.1007/978-981-19-3089-8_47

(2022). Feature Selection and Classification – A Probabilistic Wrapper Approach. doi: 10.1201/9780429332111-72

Ani, Dijah, Rahajoe. (2019). Forecasting Feature Selection based on Single Exponential Smoothing using Wrapper Method. International Journal of Advanced Computer Science and Applications, doi: 10.14569/IJACSA.2019.0100620

Oumaima, Bouakline., Y., El, Merabet., Kenza, Khomsi. (2022). Deep-Learning models for daily PM10 forecasts using feature selection and genetic algorithm. doi: 10.1109/ICOA55659.2022.9934503

Alisha, Banga., Ravinder, Ahuja., Subhash, Chander, Sharma. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of Systems Assurance Engineering and Management, 1-14. doi: 10.1007/S13198-020-01049-9

Sabyasachi, Mukherjee. (2022). Ensemble Method of Feature Selection Using Filter and Wrapper Techniques with Evolutionary Learning. 745-755. doi: 10.1007/978-981-19-4052-1_73.

Stefan, Schmainta. (2023). Correlated Features in Air Pollution Prediction. 527-536. doi: 10.1007/978-981-19-7041-2_44

Luca, Mesin., Fiammetta, Orione., Riccardo, Taormina., Eros, Pasero. (2010). A feature selection method for air quality forecasting. 6354:489-494. doi: 10.1007/978-3-642-15825-4_66

Soledad Galli (2022). Feature Selection in Machine Learning with Python, Leanpub

Banga, Alisha & Ahuja, Ravinder & Sharma, Subhash. (2021). Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities. International Journal of System Assurance Engineering and Management. 14. 10.1007/s13198-020-01049-9

Meghanathan, Natarajan. (2016). Assortativity Analysis of Real-World Network Graphs based on Centrality Metrics. Computer and Information Science. 9. 7. 10.5539/cis.v9n3p7.






Research Articles