Analysis of Matric Product Matching Between Cosine Similarity with Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec in PT. Pricebook Digital Indonesia

Authors

  • Harni Kusniyati  Faculty of Computer Science, Mercu Buana University, Jakarta Barat, Indonesia
  • Arie Aditya Nugraha  Faculty of Computer Science, Mercu Buana University, Jakarta Barat, Indonesia

DOI:

https://doi.org//10.32628/CSEIT195672

Keywords:

Product Matching, Cosine Similarity, Tf-Idf, Word2vec.

Abstract

Consumers today have the option to purchase products from thousands of e-commerce. However, the completeness of the product specifications and taxonomies used to organize products differently in different electronic shop differently. To improve the consumer experience, Pricebook approach for integration of the product through the website to find the cheapest price from various platforms. In our writing, we do approach by using a model of neural language such as TF-IDF (term frequency-inverse document frequency) as well as Word2vec by using the method of cosine similarity. TF-IDF is a way to give the relationship a word weighting (term) against the document. Semantic vector or word embedding is one way to represent the structure of a sentence will be in align with manipulating sentences into vector shapes with Word2Vec. Cosine similarity method is a method to calculate the similarity between two objects that is expressed in two vectors by using keywords (keywords) of a document as the size so that it leads to more products matching good performance and categorization. In addition, we compare the results of the representation of the TF-IDF with Word2vec against a number of the data.

References

  1. P. Ristoski, P. Petrovski, P Mika. and H. Paulheim, “A Machine Learning Approach for Product Matching and Categorization” Univ. of Mannheim, B6, 26, 68159 Mannheim, 2016
  2. C. F. Arias, J. Zuniga, G. Sidorov, I. Batyrshin, and A. Gelbukh, “A tweets classifier based on cosine similarity,” Instituto Politecnico Nacionalm Mexico City, Mexico, 2017.
  3. M. Pradeepa and V. Mohanraj, (2016) “Achieving effective keyword ranked search by using TF-IDF and cosine similarity,” International Research Journal of Engineering and Technology (IRJET). 124-130
  4. S. G. Ikwad, P Swarnalatha, and R. Agarwal, “Content Parsing Using Data Mining TF-IDF Algorithm Implementation”, Univ. of Vellore, Tamil Nadu, 2016.
  5. Haxia Liu, “Sentiment Analysis of Citations Using Word2vec,” Univ. of Nottingham Malaysia, 43500 Semenyih, Selangor Darul Ehsan, 2017.
  6. S. Qaiser and R. Ali, “Text Mining: use of TF-IDF to Examine the Relevance of Words to Documents ,” International Journal of Computer Application (0975-8887) volume 181-No1 July 2018, pg 25-29.
  7. S. Brindha, K. Prabha, and S. Sukumaran, “The Comparison of Term Based Methods Using Text Mining,” Internatoinal of Computer Science and Mobile Computing, Vol 5 Issue 9, September-2016, pg 112-116.
  8. M. Long and Z. Yanqing, “Using Word2Vec to Process Big Text Data,” 2015 IEEE International Conference on Big Data (Big Data). DOI:10.1109/BigData.2015.7364114
  9. K. Maher and M. S. Joshi, “Effectiveness of Different Similarity Measures for Text Classification and Clustering,” (IJCSIT) International Journal of Compter Science and Information Technologies, Vol 7(4), 2016, 1715-1720.
  10. D. Gunawan, C. A. Sembiring, and M. A. Budiman, “ The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents,” 2nd International Conference on Computing and Applied Informatics 2017, DOI:10.1088/1742-6596/978/1/012120.
  11. Cherid Anis, “Asymmetric And Symmetric Cryptography To Secure Social Network Media Communication: The Case Of Android-Based E-Learning Software,” International Research Journal of Computer Science (IRJCS) Issue 01, Volume 5 (January 2018), pg 1-8
  12. S. Mujiono, and SK. Purwanto, “The Implementation of E-learning System Governance to Deal with User Need, Institution Objective, and Regulation Compliance”. TELKOMNIKA, Vol 16 No 3, June 2018, pp. 1332~1344 DOI:10.12928/TELKOMNIKA.v16i3.8699
  13. ECOMMERCEIQ. (2017) Top E-Commerce Sites Indonesia. [Online]. Available: https://ecommerceiq.asia/top-ecommerce-sites-indonesia.
  14. (2016) The APJII Website, [Online] Penetration and behavior of Indonesian Internet Users. Available: https://apjii.or.id/downfile/file/surveipenetrasiinternet2016.pdf

Downloads

Published

2019-12-30

Issue

Section

Research Articles

How to Cite

[1]
Harni Kusniyati, Arie Aditya Nugraha, " Analysis of Matric Product Matching Between Cosine Similarity with Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec in PT. Pricebook Digital Indonesia , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 6, Issue 1, pp.105-112, January-February-2020. Available at doi : https://doi.org/10.32628/CSEIT195672