Tweet Segmentation and Classification for Rumor Identification using KNN Approach

Authors

  • S.Vinitha  MCA Student Department of Computer Applications, Anna University, BIT Campus, Tiruchirappalli, Tamil Nadu, India
  • Mrs S. Nalini  Assistant professor, Department of Computer Applications, Anna University, BIT Campus, Tiruchirappalli, Tamil Nadu, India

Keywords:

KNN Algorithm , Twitter, Tweet, Tweet segmentation, Named Entity Recognition

Abstract

Twitter is a source of sharing and communicate recent information, ensuing into huge size of records produces every day. Even though, a various applications of Natural Language Processing and Information Retrieval go through rigorously from an erroneous and tiny nature of tweets. We thought to implement a framework in support of segmentation of tweet by collection form, called as HybridSeg. During tweet separating with trivial segments, surroundings information is preserved and simply takes out by the downstream application. HybridSeg glance for top segmentation of a tweet through increasing stickiness score of its candidate segment. The stickiness score is explanation the possibility of a segment is express in English (global context and local context). Finally we advise and assess two models to acquire with local context by concerning the term-dependency in a collection of tweets, in the same way. Testing on two tweet data sets give you an idea about tweet segmentation superiority is considerably enhanced by global and local contexts evaluate by use of global context simply. Assessment and relationship, we demonstrate that additional correctness is accomplished in Named Entity Recognition by part-of-speech (POS).

References

  1. Doug Downey, Matthew Broadhead, and Oren Etzioni. 2007. Locating complex named entities in web text. In Proceedings of the 20th international joint conference on Artifical intelligence.
  2. K Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging for twitter: annotation, features, and experiments. In ACL-HLT, pages 42-47, 20113. 3. M. A. Hearst. Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist., 23(1):33-64, Mar. 1997.
  3. K-L. Liu, W.-J. Li, and M. Guo. Emoticon smoothed language models for twitter sentiment analysis. In AAAI,.
  4. Mark Hachman. 2011. Humanity's tweets: Just 20 terabytes. In PCMAG.COM.
  5. D N. Milne and I. H. Witten, "Learning to link with wikipedia," in CIKM, 2008, pp. 509-518..
  6. W Jiang, L. Huang, and Q. Liu, "Automatic adaption of annotation standards: Chinese word segmentation and pos tagging - a case study," in ACL, 2009, pp. 522-530.
  7. Han, B., and Baldwin, T. 2011. Lexical normalisation of short text messages: Makn sens a #twitter. In ACL, 368-378
  8. K Wang, C. Thrasher, E. Viegas, X. Li, and P. Hsu. An overview of microsoft web n-gram corpus and applications. In Proc. of NAACL-HLT, 2010.
  9. Y. Wang. Annotating and recognising named entities in clinical notes. In Proc. of the ACL-IJCNLP 2009 Student Research Workshop, 2009.

Downloads

Published

2018-06-30

Issue

Section

Research Articles

How to Cite

[1]
S.Vinitha, Mrs S. Nalini, " Tweet Segmentation and Classification for Rumor Identification using KNN Approach, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 5, pp.681-687, May-June-2018.