A Survey on Short Text Conceptualization and Clustering

Authors(1) :-P. Dileep Kumar Reddy

The trend of social media and various online applications has rapidly increased over the past few years. These computer-mediated communications has resulted in the generation of large amount of short texts. A short text refers to the text with limited contextual information. Lots of interest lies in analyzing and conceptualizing short text for understanding user intents from search queries or mining social media messages. Consequently, the task of understanding short text is crucial to many online applications. But it is not ease to handle enormous volume of short texts, since they are relatively more ambiguous and noisy than normal data. The short texts do not follow the syntax of natural language. Thus, point out the necessity for an efficient text understanding technique. Short text understanding is an important but challenging task relevant for machine intelligence. The task can potentially benefit various online applications, such as search engines, automatic question-answering, online advertising and recommendation systems. In these kind of applications, the necessary basic step is to transform an input text into a machine-interpretable model namely to "understand" the short text. To achieve this goal, various approaches have been proposed to leverage external knowledge sources as a complement to the inadequate contextual information accompanying short texts. This survey reviews current progress in short text understanding with a focus on the vector based approaches, which aim to derive the vectorial encoding for a short text.

Authors and Affiliations

P. Dileep Kumar Reddy
Lecturer, Department of Computer Science, JNTUACEA, Anantapuramu, Andhra Pradesh, India

Knowledge Mining; Short Text Understanding; Conceptualization; Semantic Computing.

  1. M. Sahami and T. D. Heilman, "A web -based kernel function for measuring the similarity of short text snippets," in WWW, 2006, pp. 377-386.http://wwwconference.org/ www2006/programme/files/pdf/3069.pdf
  2. W. tau Yih and C. Meek, "Improving similarity measures for short segments of text," in AAAI, 2007, pp. 1489-1494.https://pdfs.semanticscholar.org/33b5/ 8e4a7398ab3603c0918efab1e44a610835f6.pdf
  3. D. Shen, R. Pan, J. -T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang, "Query enrichment for web-query classi?cation," ACM Trans. Inf. Syst., vol. 24, no. 3, pp. 320-352, 2006.https://pdfs.semanticscholar.org/5c62/ae64d72f4dfabdb5835a464f8aa3f49eb257.pdf
  4. D. Kim, H. Wang, and A. H. Oh, "Context -dependent conceptualization," in IJCAI, 2013.https://arxiv.org/ pdf/1702.03342.pdf
  5. B. Stein, "Principles of hash -based text retrieval," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007, pp. 527- 534.http://ceur-ws.org/ Vol-1536 /paper23.pdf
  6. R. Salakhutdinov and G. E. Hinton, "Semantic hashing," Int. J. Approx. Reasoning, vol. 50, no. 7, pp. 969-978, 2009. https://esc.fnwi.uva.nl/thesis/centraal/files/f919407146.pdf
  7. J. A. Anderson and J. Davis, An introduction to neural networks. MIT Press, 1995. https://www.infor.uva. es/~teodoro/neuro-intro.pdf
  8. Z. Harris, "Distributional structure," Word, vol. 10, no. 23, pp. 146-162.http://copec.eu/ congresses/ wccsete2016/ proc/works/11.pdf

Publication Details

Published in : Volume 2 | Issue 4 | July-August 2017
Date of Publication : 2017-08-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 859-862
Manuscript Number : CSEIT1724208
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

P. Dileep Kumar Reddy, "A Survey on Short Text Conceptualization and Clustering", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.859-862, July-August.2017
URL : http://ijsrcseit.com/CSEIT1724208

Follow Us

Contact Us