A Survey of Numerous Text Similarity Approach

Joyinee Dasgupta; Priyanka Kumari Mishra; Selvakuberan Karuppasamy; Arpana Dipak Mahajan

doi:10.32628/CSEIT2390133

Authors

Joyinee Dasgupta Advanced Technology Centers, India
Priyanka Kumari Mishra Advanced Technology Centers, India
Selvakuberan Karuppasamy Advanced Technology Centers, India
Arpana Dipak Mahajan Advanced Technology Centers, India

DOI:

https://doi.org/10.32628/CSEIT2390133

Keywords:

Natural Language Processing; Euclidian distance, Cosine similarity, Jaccard Distance, word embeddings, Language Models ,Universal Sentence Encoders

Abstract

One of the most common NLP use cases is text similarity. Every domain comes with a variety of use cases. The most common uses of text similarity include finding related articles/news/genres, efficient use of search engines, classification of related issues on any topic, etc. It serves as a framework for many text analytics use cases. Methods to solve text similarity use cases have been around for a while, but the main drawbacks of the old methods are loss of dependency information, difficulty remembering long conversations, exploding gradient problems, etc. Recent advanced deep learning-based models pay attention to both contiguous and distant words, making their learning ability more rigorous. This white paper focuses on various text similarity techniques that can be used in everyday life to solve these use cases.

References

P. Bambroo and A. Awasthi, “LegalDB : Long DistilBERT for Legal Document Classification”.
D. Chandrasekaran and V. Mago, “Evolution of Semantic Similarity — A Survey,” vol. 54, no. 2, 2021.
X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.
Z. Huang et al., Context-aware legal citation recommendation using deep learning, vol. 1, no. 1. Association for Computing Machinery, 2021. doi: 10.1145/3462757.3466066.
S. Yang, G. Huang, B. Ofoghi, and J. Yearwood, “Short text similarity measurement using context-aware weighted biterms,” Concurr. Comput. Pract. Exp., vol. 34, no. 8, pp. 1–11, 2022, doi: 10.1002/cpe.5765.
D. W. Prakoso, A. Abdi, and C. Amrit, “Short text similarity measurement methods: a review,” Soft Comput., vol. 25, no. 6, pp. 4699–4723, 2021, doi: 10.1007/s00500-020-05479-2.
A. Kaundal, “A Review on WordNet and Vector Space Analysis for Short-text Semantic Similarity,” Int. J. Innov. Eng. Technol., vol. 8, no. 1, pp. 135–142, 2017, doi: 10.21172/ijiet.81.018.
A. W. Qurashi, V. Holmes, and A. P. Johnson, “Document Processing: Methods for Semantic Text Similarity Analysis,” INISTA 2020 - 2020 Int. Conf. Innov. Intell. Syst. Appl. Proc., pp. 0–5, 2020, doi: 10.1109/INISTA49547.2020.9194665.
T. Nora Raju, P. A. Rahana, R. Moncy, S. Ajay, and S. K. Nambiar, “Sentence Similarity - A State of Art Approaches,” Proc. Int. Conf. Comput. Commun. Secur. Intell. Syst. IC3SIS 2022, pp. 0–5, 2022, doi: 10.1109/IC3SIS54991.2022.9885721.
R. Singh and S. Singh, “Text Similarity Measures in News Articles by Vector Space Model Using NLP,” J. Inst. Eng. Ser. B, vol. 102, no. 2, pp. 329–338, 2021, doi: 10.1007/s40031-020-00501-5.

A Survey of Numerous Text Similarity Approach

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite