Comparison of Random Forest and Support Vector Machine for Indonesian Tweet Complaint Classification

Authors

  • Desi Ramayanti  Faculty of Computer Science, Universitas Mercu Buana, Jakarta Barat, Indonesia

DOI:

https://doi.org//10.32628/CSEIT195628

Keywords:

Complaint Classification, Indonesian Text, Support Vector Machine, Random Forest

Abstract

In digital business, the managerial commonly need to process text so that it can be used to support decision-making. The number of text documents contained ideas and opinions is progressing and challenging to understand one by one. Whereas if the data are processed and correctly rendered using machine learning, it can present a general overview of a particular case, organization, or object quickly. Numerous researches have been accomplished in this research area, nevertheless, most of the studies concentrated on English text classification. Every language has various techniques or methods to classify text depending on the characteristics of its grammar. The result of classification among languages may be different even though it used the same algorithm. Given the greatness of text classification, text classification algorithms that can be implemented is the support vector machine (SVM) and Random Forest (RF). Based on the background above, this research is aimed to find out the performance of support vector machine algorithm and random forest in classification of Indonesian text. 1. Result of SVM classifier with cross validation k-10 is derived the best accuracy with value 0.9648, however, it spends computational time as long as 40.118 second. Then, result of RF classifier with values, i.e. 'bootstrap': False, 'min_samples_leaf': 1, 'n_estimators': 10, 'min_samples_split': 3, 'criterion': 'entropy', 'max_features': 3, 'max_depth': None is achieved accuracy is 0.9561 and computational time 109.399 second.

References

  1. W. P. Sari, E. Cahyaningsih, D. I. Sensuse, and H. Noprisson, “The welfare classification of Indonesian national civil servant using TOPSIS and k-Nearest Neighbour (KNN),” in Research and Development (SCOReD), 2016 IEEE Student Conference on, 2016, pp. 1-5.
  2. V. Ayumi, “Pose-based Human Action Recognition with Extreme Gradient Boosting,” 2016.
  3. J. Dai and X. Liu, “Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models,” Sci. World J., 2014.
  4. T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in In Proceedings of the 10th European Conference on Machine Learning.
  5. N. Boudad, R. Faizi, R. O. H. Thami, and R. Chiheb, “Sentiment analysis in Arabic: A review of the literature,” Ain Shams Eng. J., 2017.
  6. A. M. Abd and S. M. Abd, “Case Studies in Construction Materials Modelling the strength of lightweight foamed concrete using support vector machine ( SVM ),” Case Stud. Constr. Mater., vol. 6, pp. 8-15, 2017.
  7. R. Cristina, B. Madeo, and S. M. Peres, “Gesture phase segmentation using support vector machines,” Expert Syst. Appl., vol. 56, pp. 100-115, 2016.
  8. B. Ghaddar and J. Naoum-sawaya, “High dimensional data classification and feature selection using support vector machines,” Eur. J. Oper. Res., vol. 0, pp. 1-12, 2017.
  9. L. Martí, N. Sanchez-pi, J. Manuel, and M. López, “On the combination of support vector machines and segmentation algorithms for anomaly detection : A petroleum industry comparative study,” J. Appl. Log., vol. 24, pp. 71-84, 2017.
  10. T. Pinto, T. M. Sousa, I. Praça, Z. Vale, and H. Morais, “Neurocomputing Support Vector Machines for decision support in electricity markets ’ strategic bidding,” Neurocomputing, vol. 172, pp. 438-445, 2016.
  11. S. Shabani, P. Yousefi, and G. Naser, “Support vector machines in urban water demand forecasting using phase space reconstruction,” Procedia Eng., vol. 186, pp. 537-543, 2017.
  12. V. Ayumi and M. I. Fanany, “A comparison of SVM and RVM for human action recognition,” Internetworking Indones. J., vol. 8, no. 1, pp. 29-33, 2016.
  13. B. Efron and R. Tibshirani, An introduction to the Boostrap. New York: Chapman & Hall, 1993.
  14. L. Breiman, “Bagging predictors,” Mach Learn., vol. 24, no. 2, pp. 123-40, 1996.
  15. A. Bosch, A. Zisserman, and X. Muoz, “Image classification using random forests and ferns,” in IEEE 11th International Conference on Computer Vision ICCV, 2007, pp. 1-8.
  16. A. Kuznetsova, L. Leal-Taixé, and B. Rosenhahn, “Real-time sign language recognition using a consumer depth camera,” in Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on, 2013, pp. 83-90.
  17. J. Shotton et al., “Real-time human pose recognition in parts from single depth images,” Commun. ACM, vol. 56, no. 1, pp. 116-124, 2013.
  18. A. Joshi, C. Monnier, M. Betke, and S. Sclaroff, “Comparing random forest approaches to segmenting and classifying gestures ଝ,” Image Vis. Comput., vol. 58, pp. 86-95, 2017.
  19. F. Schroff, A. Criminisi, and A. Zisserman, “Object class segmentation using random forests,” in Proceedings of the British Machine Vision Conference, 2008.

Downloads

Published

2019-12-30

Issue

Section

Research Articles

How to Cite

[1]
Desi Ramayanti, " Comparison of Random Forest and Support Vector Machine for Indonesian Tweet Complaint Classification , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 6, pp.202-207, November-December-2019. Available at doi : https://doi.org/10.32628/CSEIT195628