Manuscript Number : CSEIT184650
Effect of Dynamic Stoplist on Keyword Prediction in RAKE
Authors(4) :-Avinash Bhat, Chirag Satish, Nihal D’Souza, Nikhil Kashyap Keywords which we define as a sequence of words that provide a condensed representation of the document in question. These keywords are vital in numerous applications from web search engines to abstractive text summarization. Rapid Automatic Keyword Extraction (RAKE) [1] is an unsupervised, domain and language independent method for extracting keywords from documents. RAKE is based on the simple observation that keywords seldom contain stop words – such as and, of and the. RAKE uses a list of stop words to split the document text into candidate keywords. The list of stop words or stoplist is static. In this paper, we make the stoplist dynamic, in that, stop words, that do not currently belong to the stoplist but are identified as potential stop words for the given document are added to the stoplist. Consequently, every document has a unique stoplist. We compare the performance of our implementation to the standard RAKE implementation on Wikipedia articles.
Avinash Bhat RAKE, Keyword extraction, Stopwords, Dynamic, Wikipedia Publication Details Published in : Volume 4 | Issue 6 | May-June 2018 Article Preview
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Chirag Satish
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Nihal D’Souza
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Nikhil Kashyap
Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, Karnataka, India
Date of Publication : 2018-05-08
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 259-264
Manuscript Number : CSEIT184650
Publisher : Technoscience Academy