A Survey of Numerous Text Similarity Approach
DOI:
https://doi.org/10.32628/CSEIT2390133Keywords:
Natural Language Processing; Euclidian distance, Cosine similarity, Jaccard Distance, word embeddings, Language Models ,Universal Sentence EncodersAbstract
One of the most common NLP use cases is text similarity. Every domain comes with a variety of use cases. The most common uses of text similarity include finding related articles/news/genres, efficient use of search engines, classification of related issues on any topic, etc. It serves as a framework for many text analytics use cases. Methods to solve text similarity use cases have been around for a while, but the main drawbacks of the old methods are loss of dependency information, difficulty remembering long conversations, exploding gradient problems, etc. Recent advanced deep learning-based models pay attention to both contiguous and distant words, making their learning ability more rigorous. This white paper focuses on various text similarity techniques that can be used in everyday life to solve these use cases.
References
- P. Bambroo and A. Awasthi, “LegalDB : Long DistilBERT for Legal Document Classification”.
- D. Chandrasekaran and V. Mago, “Evolution of Semantic Similarity — A Survey,” vol. 54, no. 2, 2021.
- X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.
- Z. Huang et al., Context-aware legal citation recommendation using deep learning, vol. 1, no. 1. Association for Computing Machinery, 2021. doi: 10.1145/3462757.3466066.
- S. Yang, G. Huang, B. Ofoghi, and J. Yearwood, “Short text similarity measurement using context-aware weighted biterms,” Concurr. Comput. Pract. Exp., vol. 34, no. 8, pp. 1–11, 2022, doi: 10.1002/cpe.5765.
- D. W. Prakoso, A. Abdi, and C. Amrit, “Short text similarity measurement methods: a review,” Soft Comput., vol. 25, no. 6, pp. 4699–4723, 2021, doi: 10.1007/s00500-020-05479-2.
- A. Kaundal, “A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity,” Int. J. Innov. Eng. Technol., vol. 8, no. 1, pp. 135–142, 2017, doi: 10.21172/ijiet.81.018.
- A. W. Qurashi, V. Holmes, and A. P. Johnson, “Document Processing: Methods for Semantic Text Similarity Analysis,” INISTA 2020 - 2020 Int. Conf. Innov. Intell. Syst. Appl. Proc., pp. 0–5, 2020, doi: 10.1109/INISTA49547.2020.9194665.
- T. Nora Raju, P. A. Rahana, R. Moncy, S. Ajay, and S. K. Nambiar, “Sentence Similarity - A State of Art Approaches,” Proc. Int. Conf. Comput. Commun. Secur. Intell. Syst. IC3SIS 2022, pp. 0–5, 2022, doi: 10.1109/IC3SIS54991.2022.9885721.
- R. Singh and S. Singh, “Text Similarity Measures in News Articles by Vector Space Model Using NLP,” J. Inst. Eng. Ser. B, vol. 102, no. 2, pp. 329–338, 2021, doi: 10.1007/s40031-020-00501-5.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.