Optimal Feature Extraction Technique for Sentiment Analysis of Product Reviews for Product Development
DOI:
https://doi.org/10.32628/CSEIT251112171Keywords:
Product, Review, Features, Opinion, POS, N-Gram, Opinion MiningAbstract
Consumer review sites, social media and micro-blogs carry a wealth of information on the general perspective, experience and feedback that consumers have on products. When there is a high volume of product reviews, it can be challenging to product developers to sift through and make a decision based on consumers’ sentiments. Sentiment Analysis, a branch of Artificial Intelligence, assists in providing data to help businesses understand customers’ desire and track how brands and goods are perceived. When performing Sentiment Analysis, feature extraction, converts raw text input into a machine learning compatible format. A strong feature set is necessary in order to achieve high prediction and object classification accuracy. Identifying an optimal feature set combination is critical for increasing the overall performance of data classification. In this research, we tackle this problem by identifying an optimal feature extraction technique for product review Sentiment Analysis using a feature-level analysis. N-gram, POS and techniques based on the lexicons Stanford CoreNLP, TextBlob, and SentiWordNet in different combinations are examined. Multinomial Naïve Bayes, Lexicon and Multinomial Naïve Bayes + Unsupervised Lexicon ensemble classifiers were modeled for classification of the reviews into positive, neutral and negative classes thereby identifying the optimal feature combination. We explored optimal feature extraction technique based on real product reviews datasets for two products; a car make and model known as “Nissan Sentra” and a mobile phone product known as “Samsung Galaxy A12”. The optimal feature extraction technique for MNB and MNB + Lexicon ensemble classifications was provided by a combination of N-Gram, Part of Speech and TextBlob features while the optimal technique for unsupervised Lexicon was provided by a combination of N-Gram, Part of Speech and VADER.
Downloads
References
I. Rutkowski, “Success and Failure Rates Of New Food And Non-Food Products Introduced on the Market.” Journal of Marketing and Consumer Behaviour in Emerging Markets, pp 52-61, 2022, doi:10.7172/2449-6634.jmcbem.2022.1.4. DOI: https://doi.org/10.7172/2449-6634.jmcbem.2022.1.4
Y. Nagornyi, “The Scope of Failure Rates in the Market of New and Innovative Products”. Modern Economics. Vol 16. Pp 108-114, 2019, doi: 10.31521/modecon.V16(2019)-16 DOI: https://doi.org/10.31521/modecon.V16(2019)-16
K. Victory, M. Nenycz-Thiel, J. Dawes, A. Tanusondjaja and A. Corsi, A. “How common is new product failure and when does it vary?” Marketing Letters, 2021, vol 32, pp 17-32, 2021, doi: 10.1007/s11002-021-09555-x. DOI: https://doi.org/10.1007/s11002-021-09555-x
C. Nobel, “Research & ideas: Clay Christensen's milkshake marketing. Insights: Innovation & New Product Forecasts”, Working Knowledge, 2011, 1-2.
A. Griffin, “Product Development Cycle Time for Business to Business Products”. Industrial Marketing Management, vol 31. pp 291-304, 2002, doi: 10.1016/S0019-8501(01)00162-6,. DOI: https://doi.org/10.1016/S0019-8501(01)00162-6
S. Iranmanesh, S. Foroutan and S. Alizade, “A Conceptual framework for complexity reduction in New Product Development Processes using Industry 4.0 Technologies”, 10th International Symposium on Intelligent Manufacturing and Service System, 2021.
R. Solarte Bolaños and S. Barbalho “Exploring product complexity and prototype lead-times to predict new product development cycle-times”, International Journal of Production Economics. Vol 235. 2021, ISSN: 108077. doi: 10.1016/j.ijpe.2021.108077. DOI: https://doi.org/10.1016/j.ijpe.2021.108077
E. Anderson, S. Lin, D. Simester, and C. Tucker, “Harbingers of Failure,” Journal of Marketing Research 52, no. 5, pp 580-592, 2015. DOI: https://doi.org/10.1509/jmr.13.0415
M. Kazimierska, Marianna and M. Grębosz-Krawczyk, “New Product Development (NPD) Process – An Example of Industrial Sector. Management Systems in Production Engineering”. Vol 25, pp 246-250, 2017, doi: 10.1515/mspe-2017-0035. DOI: https://doi.org/10.1515/mspe-2017-0035
E, Sertac & T.K., Nihan. “Traditional Market Research and Neuromarketing Research: A Comparative Overview”, 2020, doi:10.4018/978-1-7998-3126-6.ch008. DOI: https://doi.org/10.4018/978-1-7998-3126-6.ch008
H. Lu, Y. Li, M. Chen., H. Kim, and S. Serikawa, S. (2018). “Brain intelligence: Go beyond artificial intelligence”. Mobile Networks and Applications, 2018, Vol 23(2), pp 368–375. DOI: https://doi.org/10.1007/s11036-017-0932-8
T. Vashishth, V. Sharma, K. Sharma, B. Kumar, S. Chaudhary and R. Panwar, “AI and Data Analytics for Market Research and Competitive Intelligence Final”. doi:10.4018/979-8-3693-1058-8.ch008. DOI: https://doi.org/10.4018/979-8-3693-1058-8.ch008
C. Priyanka and D. Gupta, "Identifying the best feature combination for sentiment analysis of customer reviews," 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Mysore, India, pp. 102-108, 2013, doi: 10.1109/ICACCI.2013.6637154. DOI: https://doi.org/10.1109/ICACCI.2013.6637154
S.C. Lee, D.G. Lee and Y.S. Seo, “Determining the best feature combination through text and probabilistic feature analysis for GPT-2-based mobile app review detection”, Appl Intell vol 54, pp 1219–1246, 2024, doi: 10.1007/s10489-023-05201-3 DOI: https://doi.org/10.1007/s10489-023-05201-3
C. Doğdu, T. Kessler, D. Schneider, M. Shadaydeh and, S.R. Schweinberger, “A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech”, Sensors. 2022; vol 22(19):7561, 2022, doi:10.3390/s22197561 DOI: https://doi.org/10.3390/s22197561
N. Iqbal, A. Chowdhury, and T. Ahsan, “Enhancing the Performance of Sentiment Analysis by Using Different Feature Combinations”, 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), 2018, doi:10.1109/ic4me2.2018.8465673 DOI: https://doi.org/10.1109/IC4ME2.2018.8465673
R. V. O. I. Sudiro, S. S. Prasetiyowati and Y. Sibaroni, "Aspect Based Sentiment Analysis With Combination Feature Extraction LDA and Word2vec," 2021 9th International Conference on Information and Communication Technology (ICoICT), Yogyakarta, Indonesia, pp. 611-615, 2021, doi: 10.1109/ICoICT52021.2021.9527506. DOI: https://doi.org/10.1109/ICoICT52021.2021.9527506
A. Basant and M. Namita, “Categorical Probability Proportion Difference (CPPD): A Feature Selection Method for Sentiment Classification”, 2012.
L. P Hung, R. Alfred and M. H. A. Hijazi, “A Performance Comparison of Feature Selection Methods for Sentiment Classification”, Computational Science and Technology, pp 21–30, 2018, doi:10.1007/978-981-10-8276. DOI: https://doi.org/10.1007/978-981-10-8276-4_3
M. Généreux, T. Poibeau and M.Koppel, “Sentiment Analysis Using Automatically Labelled Financial News Items” Text, Speech and Language Technology, pp 101-114, 2011, 10.1007/978-94-007-1757-2_9. DOI: https://doi.org/10.1007/978-94-007-1757-2_9
Z. Zhai, X. Zhongwu, K. Hua, B. Kang and P. Jia. “Exploiting Effective Features for Chinese Sentiment Classification”, Expert Syst. Appl.. vol 38, pp 9139-9146, 2011, doi: 10.1016/j.eswa.2011.01.047. DOI: https://doi.org/10.1016/j.eswa.2011.01.047
F. Sepideh, T., Zhiyua, K. Mohsen and M. Aida, “NgramPOS: A Bigram-Based Linguistic and Statistical Feature Process Model for Unstructured Text Classification.” Wireless Networks. Vol 28. Doi:10.1007/s11276-018-01909-0. DOI: https://doi.org/10.1007/s11276-018-01909-0
Y. Mejova and P. Srinivasan, ”Exploring Feature Definition and Selection for Sentiment Classifiers”, Proceedings of the International AAAI Conference on Web and Social Media, vol 5(1), pp 546-549, doi: 10.1609/icwsm.v5i1.14163 DOI: https://doi.org/10.1609/icwsm.v5i1.14163
G. Ayman and A. Basem, “Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory”, Journal of Intelligent Systems, vol 29, 2019, doi: 10.1515/jisys-2018-0356. DOI: https://doi.org/10.1515/jisys-2018-0356
P. Pankaj, M. Pandey and N. Soni, "Sentiment Analysis on Customer Feedback Data: Amazon Product Reviews," 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 320-322, , 2019, doi: 10.1109/COMITCon.2019.8862258. DOI: https://doi.org/10.1109/COMITCon.2019.8862258
N. Van Otten, “Part-of-speech (POS) Tagging In NLP: 4 Python How To Tutorials”, 2023, https://spotintelligence.com/
A. Sharma, “How Part-of-Speech Tag, Dependency and Constituency Parsing Aid In Understanding Text Data?”, 2020, https://www.analyticsvidhya.com/blog/2020/07/part-of-speechpos-tagging-dependency-parsing-and-constituency-parsing-in-nlp/
Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135. doi:10.1561/1500000011 DOI: https://doi.org/10.1561/1500000011
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit”, Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Syst. Demonstr., Association for Computational Linguistics, Baltimore, Maryland, 2014, pp. 55–60. doi:10.3115/v1/P14-5010. DOI: https://doi.org/10.3115/v1/P14-5010
L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu, “Combining lexicon-based and learningbased methods for twitter sentiment analysis,” HP Lab. Tech. Rep., 2011. 89, 2011.
Geriska, I. & Warih, M., & Ibnu, A. (2019). Analysis on Opinion Mining Using Combining Lexicon-Based Method and Multinomial Naïve Bayes. 10.2991/icoiese-18.2019.38.
R. Dalal, I. Safhath, R. Piryani, D. R. Kappara and V. K. Singh, "A Lexicon Pooled Machine Learning Classifier for Opinion Mining from Course Feedbacks," in Advances in Intelligent Informatics, Switzerland, Springer, 2015, pp. 419-428 DOI: https://doi.org/10.1007/978-3-319-11218-3_38
J. Brownlee, “How to calculate precision, recall, and F-measure for imbalanced classification,” 2020, Available: https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalancedclassification.
P. Flach and M. Kull, “Precision-Recall-Gain Curves: PR Analysis Done Right”, Advances in Neural Information Processing Systems, Vol. 1, pp. 838-846, Massachusetts Institute of Technology (MIT) Press. https://papers.nips.cc/paper/5867- precision-recall-gain-curves-pr-analysis-done-right
P. Thölke, Y. Mantilla-Ramos, H. Abdelhedi, C. Maschke, A. Dehgan, Y. Harel, A. Kemtur, L. Mekki Berrada, M.Sahraoui, T. Young, A. Bellemare Pépin, C. El Khantour, M. Landry, A. Pascarella, V.Hadid, E. Combrisson, J. O’Byrne, K. Jerbi,"Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data", NeuroImage, Vol 277, 2023, doi: 10.1016/j.neuroimage.2023.120253. DOI: https://doi.org/10.1016/j.neuroimage.2023.120253
S. Nath, “Precision-Recall Trade-Off: Navigating the Delicate Balance in Imbalanced Datasets”, Medium, 2023, Available: https://medium.com/
H. Lu and Z., Kai. “How Do General-Purpose Sentiment Analyzers Perform when Applied to Health-Related Online Social Media Data?”, 2019, Studies in health technology and informatics. Vol 264. p 1208-1212, doi:10.3233/SHTI190418.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Computer Science, Engineering and Information Technology
![Creative Commons License](http://i.creativecommons.org/l/by/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution 4.0 International License.