Review on Exploring Similarity between Two Questions Using Machine Learning

Authors

  • Ms. Vishwaja M. Tambakhe  Department of Computer Science and Engineering, Government College of Engineering, Amravati, Maharashtra, India
  • Dr. Kishor P.Wagh  Department of Information and Technology, Government College of Engineering, Amravati, Maharashtra, India

DOI:

https://doi.org//10.32628/CSEIT217360

Keywords:

Duplication of questions, Natural Language Processing and Machine Learning.

Abstract

Question duplication is the main problem which is based on functionality of allowing users to ask questions . Questions are often answered, and the duplication problem is faced by question and answer sites such as Quora and Reddit, Stack-overflow, and others. Answers are segmented through various iterations of the same question due to question continuity. The aim is to detect the duplicate questions for reducing the redundancy in data. This is a worst experience of users, as the answers get segmented on various versions of the same question, it is bad for writers as well as seekers. Actually this problem also has been noticed on other platforms of Q&A. In this proposed work a simple neural architecture with natural language inference will be used. The approach gathers an attention to pound the problem into sub-problems that helps it to be solved separately, thus making it menially parallelizable. This work is just completely a new pattern, for the solution and it is also possible that it will not provide the complete solution to the problem but may help in increasing the efficiency of the model to predict the duplication's among several question pairs. Question duplication is the serious problem due to the segmentation of answers in various variants of the same question because ofduplication's in these discussion boards. Lastly, As a consequence, there is a lack of a rational search, solution indifference, knowledge separation, and an insufficiency of responses to the questioners. This could be avoided by employing Natural Language Processing as well as Machine Learning, which will help to improve the performance as well.

References

  1. Martin Aabadi, Aashish Aagarwal, Paul Barhaam, Eugene Brvdo, Zhifng Chen, Craaig Citro, Greg S Corado,Andy Davis, Jeffrey Dean, Matthieu Devin, et al.2016. Tennsorflow:Largescale machiinelearnning on heterogeneous distrbute systems. arXiv preprintarXiv:1603.04467
  2. YEUNG, K. (2016, March 17). Quora has millions of daily visitors, up from 80 million in January. https://venturebeat.com/2016/03/17/quora-now-has-100-million-monthly-visitors-up-from-80-million-in-january
  3. Lili Jiang, S. C. (n.d.). Quora: https://engineering.quora.com/Semantic-Question-Matching-with-Deep-Learning
  4. mccormickml. (2016, April 12). Retrieved from mccormickml: http://mccormickml.com/2016/04/12/googlespretrained-word2vec-model-in-python
  5. Machine Learning Mastery. (2017, June 15). https://machine learning mastery.com / prepare-textdata-machine-learning-scikit-learn.
  6. Brownlee, J. (2017, October 19). A Gentle Introduction to the Bag of Words Model.https://machinelearningmasterry.com/gentle-introduction-bag-words-model
  7. Rajaraman, A.; Ullman, J.D. (2011). "Data Mining". Minning of Masive Dataset(PDF). pp. 1–17.doi:10.1017/CBO981139058452.002. ISBN 978-1-139-05845-2
  8. Gilyadov, J. (2018, March 23). Word2VecExplained.Retrieved github.io: https://isrelg9.github.io/2018-03-23-Word2Vec-Explained.
  9. McComick, C. Google's trainedWord2Vec model in Python. Retrieved from mcrmickml.com:http://mccormickml.com/2016/04/12.
  10. Thakur, A. (2017, Feb 27). "Is That a Duplicate Quora Questions?" Retrieved from Linkedin: https://www.linkein.com/pulse/duplicate-quora-questiona bhishek -thakur.
  11. Tim Rocktahel, Edward Grefenstte, Karl Mortz Herman, Tomas, Phil Bluom. Reasoning about entailment with neural attention. In ICLR 2016
  12. E. Agirre, C. Banea, D. Cer, M. Diab, A. GonzalezAgirre, R. Mihalcea, G. Rigau, and J. Wiebe, “Semeval2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation,” in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 497–511.
  13. E. Agirre, A. Gonzalez-Agirre, D. Cer, and M. Diab, “Semeval-2012 task 6: A pilot on semantic textual similarity,” in Proceedings of 1st Joint Conference on Lexical and Computational Semantics, 2018, pp. 384–392.
  14. Andri Z Brder. 1997. On resemblane and containment of docuuments. In Compresion Complexiity of Sequences 1997. Proccedings. IEEE, pages 212–219.
  15. Kauntal, Ritevik Shrivast and Sarooj Kashiik. 2016. A paraphrase and semanticsimilarity detection systemfor user generated short text content on micro blogs. In COLING. pages 2880–2890.
  16. Broley, Jane, "Signature Verification Using A "Siamese" Time Delay Neural Network." IJPRAI 7.4 (1993): 669-688.
  17. Wang, Zhguo, Wael, and Radu. "Bilatreal MultPerspective Matchingfor Natural LanguageSentences." [arXivarXv:17020814 (2019)].
  18. Wng, Shuhang, and Jing Jang. "A ComparativeAggregate ModelforMatching TextSequeces." arXiv preprint arXiv:1611.01747 (2016).
  19. Addair, T. (2016, Feb 20). "DupliicateQuestionPairDetection". Retrieved from stanford.edu: https://web.stanford.edu/class/cs224n/reports/2759336.pdf
  20. Lei Guo, C. L. (2017, Jan 16). DupliicateQuoraQuestionsDetction. Retrieved fromsemanticscholar.org:https://pdfs.semanticscholar.org/4c19/2b8f45/b1he913ee7da32624cd75/59eccb0890.pdf

Downloads

Published

2021-06-30

Issue

Section

Research Articles

How to Cite

[1]
Ms. Vishwaja M. Tambakhe, Dr. Kishor P.Wagh, " Review on Exploring Similarity between Two Questions Using Machine Learning, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 3, pp.287-293, May-June-2021. Available at doi : https://doi.org/10.32628/CSEIT217360