Effect of Dynamic Stoplist on Keyword Prediction in RAKE

Authors(4) :-Avinash Bhat, Chirag Satish, Nihal D’Souza, Nikhil Kashyap

Keywords which we define as a sequence of words that provide a condensed representation of the document in question. These keywords are vital in numerous applications from web search engines to abstractive text summarization. Rapid Automatic Keyword Extraction (RAKE) [1] is an unsupervised, domain and language independent method for extracting keywords from documents. RAKE is based on the simple observation that keywords seldom contain stop words – such as and, of and the. RAKE uses a list of stop words to split the document text into candidate keywords. The list of stop words or stoplist is static. In this paper, we make the stoplist dynamic, in that, stop words, that do not currently belong to the stoplist but are identified as potential stop words for the given document are added to the stoplist. Consequently, every document has a unique stoplist. We compare the performance of our implementation to the standard RAKE implementation on Wikipedia articles.

Authors and Affiliations

Avinash Bhat
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Chirag Satish
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Nihal D’Souza
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Nikhil Kashyap
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India

RAKE, Keyword extraction, Stopwords, Dynamic, Wikipedia

  1. Rose, S., Engel, D., Cramer, N. and Cowley, W., 2010. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory, pp.1-20.https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/59edf51fa6fdccbbefd5434a/Automatic-Keyword-Extraction-from-Individual-Documents.pdf
  2. Stop words in review summarization using TextRank by Sonya RapintaManalu, 2017 http://ieeexplore.ieee.org/document/8096371/
  3. Stop-words in keyphrase extraction problem by S. Popova, L. Kovriguina, D. Mouromtsev, I. Khodyrev, 2013 http://ieeexplore.ieee.org/document/6737953/
  4. Wilbur, W.J. and Sirotkin, K., 1992. The automatic identification of stop words. Journal of information science, 18(1), pp.45-55
  5. - Silva, C. and Ribeiro, B., 2003, July. The importance of stop word removal on recall values in text categorization. In Neural Networks, 2003. Proceedings of the International Joint Conference on (Vol. 3, pp. 1661-1666). IEEE http://www.sciencedirect.com/science/article/pii/S0167739X10002554
  6. Yao, Z. and Ze-wen, C., 2011, March. Research on the construction and filter method of stop-word list in text preprocessing. In Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on (Vol. 1, pp. 217-221). IEEE
  7. “Automatically building a Stopword list for an information retrieval system” by Rachel Tsz-Wai Lo et. al. http://terrierteam.dcs.gla.ac.uk/publications/rtlo_DIRpaper.pdf
  8. “On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter” by Hassan Saif, Miriam Fernandez, Yulan He, Harith Alani
  9. NLTK http://www.nltk.org/_modules/nltk/tag.html

Publication Details

Published in : Volume 4 | Issue 6 | May-June 2018
Date of Publication : 2018-05-08
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 259-264
Manuscript Number : CSEIT184650
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Avinash Bhat, Chirag Satish, Nihal D’Souza, Nikhil Kashyap, "Effect of Dynamic Stoplist on Keyword Prediction in RAKE", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 4, Issue 6, pp.259-264, May-June-2018.
Journal URL : http://ijsrcseit.com/CSEIT184650

Article Preview

Follow Us

Contact Us