Preprocessing Farmer Query Data Using Classic Method and Building Classifier Model

Authors

  • Yudhvir Singh  Department of Computer Science & Engineering, Maharaja Ranjit Singh Punjab Technical University Bathinda, Punjab, India
  • Naresh Kumar Garg  Department of Computer Science & Engineering, Maharaja Ranjit Singh Punjab Technical University Bathinda, Punjab, India

Keywords:

Machine Learning, Preprocessing, Text Data, Classification, Logistic Classifier

Abstract

Being important preliminary step, preprocessing is critical phase in text mining and other related fields. Real world data contains errors of varying magnitude with multiple interrelated issues. Data preprocessing used to transform it into a form, which is readable, acceptable by tools, data that is free from ambiguities, duplicity. In this research work, we are dealing with farmer query data set, which is kind of text data, structured in tabular form. In case of text data, before any meaningful information retrieval, preprocessing techniques are applied on the target data set to reduce the size of the data set which will increase its effectiveness. The objective of our work is to analyze the issues of preprocessing operation such as tokenization, formatting, stop word removal for our text data. After preprocessing operations , further we have used logistic classifier to binary classify and model the farmer dataset. Logistic classifier gives good accuracy results and thus proves machine learning role in farmer query classification area.

References

  1. J. Abel and W. Teahan, "Universal text preprocessing for data compression," IEEE Transactions on Computers, vol. 54, no. 5, pp. 497-507, May 2005. DOI: 10.1109/TC.2005.85
  2. H. Jing, "Identifying accents in Italian text: a preprocessing step in TTS," IEEE Workshop on Speech Synthesis, pp. 151-154, 2002. DOI: 10.1109/WSS.2002.1224396
  3. A. I. Kadhim, Y. N. Cheah and N. H. Ahamed, "Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering," 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, Kota Kinabalu, 2014, pp. 69-73.DOI: 10.1109/ICAIET.2014.21
  4. W. T. Aung and K. H. M. S. Hla, "Random forest classifier for multi-category classification of web pages," IEEE Asia-Pacific Services Computing, 2009, pp. 372–376. DOI: 10.1109/APSCC.2009.5394100
  5. F. Al Shamsi and Z. Aung, "Automatic patent classification by a three-phase model with document frequency matrix and boosted tree,"International Conference on Electronic Devices, Systems, and Applications, 2017. DOI: 10.1109/ICEDSA.2016.7818566
  6. D. K. Renuka, T. Hamsapriya, M. R. Chakkaravarthi and P. L. Surya, "Spam Classification Based on Supervised Learning Using Machine Learning Techniques," International Conference on Process Automation, Control and Computing, Coimbatore, 2011, pp. 1-7. DOI: 10.1109/PACC.2011.5979035
  7. P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, "Random forest classification of multisource remote sensing and geographic data,"IEEE Int. Geosci. Remote Sens. Symp. (IGARSS 2004), vol. 0, no. C, pp. 1049–1052, 2004.
  8. Y. H. Kim, S. J. Yoo, Y. H. Gu, J. H. Lim, D. Han, and S. W. Baik, "Crop Pests Prediction Method Using Regression and Machine Learning Technology: Survey,"IERI Procedia, vol. 6, pp. 52–56, 2014.
  9. "Open Government Data (OGD) Platform India."Online]. Available at: https://data.gov.in/. Accessed 11 June 2017].
  10. "Python Data Analysis Library."online] Available at: https://pandas.pydata.org. Accessed 19 Aug. 2017].
  11. Anjali, et al "A comparative study of stemming algorithms”. Int. J. Comp. Tech. Applications. Vol. 2, pp:1930-1938, 2007.
  12. J. Leskovec J, A. Rajaraman A, J.D. Ullman, Data Mining. In: Mining of massive datasets, 2nd edn. Cambridge University Press, Cambridge, 2014, pp 1-18.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Yudhvir Singh, Naresh Kumar Garg, " Preprocessing Farmer Query Data Using Classic Method and Building Classifier Model , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 3, pp.1195-1199, March-April-2018.