Effect of Dynamic Stoplist on Keyword Prediction in RAKE
Keywords:
RAKE, Keyword extraction, Stopwords, Dynamic, WikipediaAbstract
Keywords which we define as a sequence of words that provide a condensed representation of the document in question. These keywords are vital in numerous applications from web search engines to abstractive text summarization. Rapid Automatic Keyword Extraction (RAKE) [1] is an unsupervised, domain and language independent method for extracting keywords from documents. RAKE is based on the simple observation that keywords seldom contain stop words – such as and, of and the. RAKE uses a list of stop words to split the document text into candidate keywords. The list of stop words or stoplist is static. In this paper, we make the stoplist dynamic, in that, stop words, that do not currently belong to the stoplist but are identified as potential stop words for the given document are added to the stoplist. Consequently, every document has a unique stoplist. We compare the performance of our implementation to the standard RAKE implementation on Wikipedia articles.
References
- Rose, S., Engel, D., Cramer, N. and Cowley, W., 2010. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory, pp.1-20.https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/59edf51fa6fdccbbefd5434a/Automatic-Keyword-Extraction-from-Individual-Documents.pdf
- Stop words in review summarization using TextRank by Sonya RapintaManalu, 2017 http://ieeexplore.ieee.org/document/8096371/
- Stop-words in keyphrase extraction problem by S. Popova, L. Kovriguina, D. Mouromtsev, I. Khodyrev, 2013 http://ieeexplore.ieee.org/document/6737953/
- Wilbur, W.J. and Sirotkin, K., 1992. The automatic identification of stop words. Journal of information science, 18(1), pp.45-55
- - Silva, C. and Ribeiro, B., 2003, July. The importance of stop word removal on recall values in text categorization. In Neural Networks, 2003. Proceedings of the International Joint Conference on (Vol. 3, pp. 1661-1666). IEEE http://www.sciencedirect.com/science/article/pii/S0167739X10002554
- Yao, Z. and Ze-wen, C., 2011, March. Research on the construction and filter method of stop-word list in text preprocessing. In Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on (Vol. 1, pp. 217-221). IEEE
- “Automatically building a Stopword list for an information retrieval system” by Rachel Tsz-Wai Lo et. al. http://terrierteam.dcs.gla.ac.uk/publications/rtlo_DIRpaper.pdf
- “On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter” by Hassan Saif, Miriam Fernandez, Yulan He, Harith Alani
- NLTK http://www.nltk.org/_modules/nltk/tag.html
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.