A Survey on Short Text Conceptualization and Clustering

Authors

  • P. Dileep Kumar Reddy  Lecturer, Department of Computer Science, JNTUACEA, Anantapuramu, Andhra Pradesh, India

Keywords:

Knowledge Mining; Short Text Understanding; Conceptualization; Semantic Computing.

Abstract

The trend of social media and various online applications has rapidly increased over the past few years. These computer-mediated communications has resulted in the generation of large amount of short texts. A short text refers to the text with limited contextual information. Lots of interest lies in analyzing and conceptualizing short text for understanding user intents from search queries or mining social media messages. Consequently, the task of understanding short text is crucial to many online applications. But it is not ease to handle enormous volume of short texts, since they are relatively more ambiguous and noisy than normal data. The short texts do not follow the syntax of natural language. Thus, point out the necessity for an efficient text understanding technique. Short text understanding is an important but challenging task relevant for machine intelligence. The task can potentially benefit various online applications, such as search engines, automatic question-answering, online advertising and recommendation systems. In these kind of applications, the necessary basic step is to transform an input text into a machine-interpretable model namely to "understand" the short text. To achieve this goal, various approaches have been proposed to leverage external knowledge sources as a complement to the inadequate contextual information accompanying short texts. This survey reviews current progress in short text understanding with a focus on the vector based approaches, which aim to derive the vectorial encoding for a short text.

References

  1. M. Sahami and T. D. Heilman, "A web -based kernel function for measuring the similarity of short text snippets," in WWW, 2006, pp. 377-386.http://wwwconference.org/ www2006/programme/files/pdf/3069.pdf
  2. W. tau Yih and C. Meek, "Improving similarity measures for short segments of text," in AAAI, 2007, pp. 1489-1494.https://pdfs.semanticscholar.org/33b5/ 8e4a7398ab3603c0918efab1e44a610835f6.pdf
  3. D. Shen, R. Pan, J. -T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang, "Query enrichment for web-query classification," ACM Trans. Inf. Syst., vol. 24, no. 3, pp. 320-352, 2006.https://pdfs.semanticscholar.org/5c62/ae64d72f4dfabdb5835a464f8aa3f49eb257.pdf
  4. D. Kim, H. Wang, and A. H. Oh, "Context -dependent conceptualization," in IJCAI, 2013.https://arxiv.org/ pdf/1702.03342.pdf
  5. B. Stein, "Principles of hash -based text retrieval," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007, pp. 527- 534.http://ceur-ws.org/ Vol-1536 /paper23.pdf
  6. R. Salakhutdinov and G. E. Hinton, "Semantic hashing," Int. J. Approx. Reasoning, vol. 50, no. 7, pp. 969-978, 2009. https://esc.fnwi.uva.nl/thesis/centraal/files/f919407146.pdf
  7. J. A. Anderson and J. Davis, An introduction to neural networks. MIT Press, 1995. https://www.infor.uva. es/~teodoro/neuro-intro.pdf
  8. Z. Harris, "Distributional structure," Word, vol. 10, no. 23, pp. 146-162.http://copec.eu/ congresses/ wccsete2016/ proc/works/11.pdf

Downloads

Published

2017-08-31

Issue

Section

Research Articles

How to Cite

[1]
P. Dileep Kumar Reddy, " A Survey on Short Text Conceptualization and Clustering, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.859-862, July-August-2017.