Keyphrase Extraction from Scientific Articles

Authors

  • Navitha Abhinaya S Department of Machine Learning, BMS College of Engineering, Bangalore, India Author
  • Neha H Department of Machine Learning, BMS College of Engineering, Bangalore, India Author
  • Papireddigari Renusree Department of Machine Learning, BMS College of Engineering, Bangalore, India Author
  • Sowmya Lakshmi B. S Assistant Professor, Department of Machine Learning, BMS College of Engineering, Bangalore, India Author

DOI:

https://doi.org/10.32628/CSEIT24103210

Keywords:

Keyphrase Extraction, Natural Language Processing, TF-IDF, Scientific Articles, Text Preprocessing

Abstract

Keyphrase extraction is a crucial task in natural language processing (NLP) that involves identifying important terms and phrases in a text. This paper presents a methodology for extracting keyphrases from scientific articles using a combination of preprocessing techniques and the term frequency-inverse document frequency (TF-IDF) algorithm. The approach includes tokenization, stopword removal, and punctuation elimination, followed by the application of the TF-IDF vectorizer to identify and score keyphrases. The results demonstrate the effectiveness of the method in highlighting significant terms in scientific texts.

Downloads

Download data is not yet available.

References

E. Hulth, "Improved automatic keyword extraction given more linguistic knowledge," in Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 216-223, July 2003. DOI: https://doi.org/10.3115/1119355.1119383

G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988. DOI: https://doi.org/10.1016/0306-4573(88)90021-0

K. S. Hasan and V. Ng, "Automatic keyphrase extraction: A survey of the state of the art," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1262-1273, June 2014. DOI: https://doi.org/10.3115/v1/P14-1119

D. Wang, S. Li, F. Ren, and J. Wu, "Feature selection for keyphrase extraction in scientific publications," in Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 1217-1224, July 2006.

R. Mihalcea and P. Tarau, "Textrank: Bringing order into text," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 404-411, July 2004.

T. K. Landauer, P. W. Foltz, and D. Laham, "An introduction to latent semantic analysis," Discourse Processes, vol. 25, no. 2-3, pp. 259-284, 1998. DOI: https://doi.org/10.1080/01638539809545028

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171-4186, June 2019.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Proceedings of the International Conference on Learning Representations (ICLR) 2013, vol. 1, pp. 1-12, May 2013.

A. Rose, E. Engel, and J. Wess, "Keyword extraction from a single document using word co-occurrence statistical information," in International Journal on Document Analysis and Recognition (IJDAR), vol. 16, no. 3, pp. 208-218, September 2013.

Y. Matsuo and M. Ishizuka, "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, vol. 13, no. 1, pp. 157-169, 2004. DOI: https://doi.org/10.1142/S0218213004001466

Feather, J. and S. P., International encyclopedia of information and library science. London & New York: Routledge, 1996

Justeson, J., Katz, S., ―Technical terminology: some linguistic properties and an algorithm for identification in text, Natural Language Engineering 1, 9-27, 1995 DOI: https://doi.org/10.1017/S1351324900000048

J.D.Cohen, ―Highlights: Language and Domainindependent Automatic Indexing Terms for Abstracting Journal of the American Society for Information Science, 46(3): 162-174, 1995 DOI: https://doi.org/10.1002/(SICI)1097-4571(199504)46:3<162::AID-ASI2>3.0.CO;2-6

M. Ortuño et al., ―Keyword detection in natural languages and DNA, Europhys. Lett. 57, 759, 2002 DOI: https://doi.org/10.1209/epl/i2002-00528-3

J.P. Herrera, P.A. Pury, ―Statistical keyword detection in literary corpora, The European physical journal, 2008 DOI: https://doi.org/10.1140/epjb/e2008-00206-x

P. Carpena et al., ―Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, 79, 03512(R), 2009 DOI: https://doi.org/10.1103/PhysRevE.79.035102

Turney P. D., ―Learning algorithms for keyphrase extraction, Information Retrieval, 2: pp 303-336, 2000 DOI: https://doi.org/10.1023/A:1009976227802

Frank E., Paynter G.W., Witten I.H., Gutwin C., NevillManning C.G., Domain-specific keyphrase extraction,Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668-673. San Francisco, CA, USA, 1999

Downloads

Published

20-06-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 178

You may also start an advanced similarity search for this article.