Using Wikipedia's Big Data for creation of Knowledge Bases

Authors

  • Mohamed Minhaj  Associate Professor, SDM Institute for Management Development (SDMIMD), Mysore, India

DOI:

https://doi.org//10.32628/CSEIT217546

Keywords:

Wikipedia, WWW, IT applications

Abstract

Wikipedia is among the most prominent and comprehensive sources of information available on the WWW. However, its unstructured form impedes direct interpretation by machines. Knowledge Base (KB) creation is a line of research that enables interpretation of Wikipedia's concealed knowledge by machines. In light of the efficacy of KBs for the storage and efficient retrieval of semantic information required for powering several IT applications such Question-Answering System, many large-scale knowledge bases have been developed. These KBs have employed different approaches for data curation and storage. The retrieval mechanism facilitated by these KBs is also different. Further, they differ in their depth and breadth of knowledge. This paper endeavours to explicate the process of KB creation using Wikipedia and compare the prominent KBs developed using the big data of Wikipedia.

References

  1. T. Berners-Lee, J. Hendler, and O. Lassila, 'The semantic web', Sci. Am., vol. 284, no. 5, pp. 34–43, 2001.
  2. A. Ratner and C. Ré, 'Knowledge Base Construction in the Machine-learning Era: Three critical design points: Joint-learning, weak supervision, and new representations', Queue, vol. 16, no. 3, pp. 79–90, Jun. 2018, doi: 10.1145/3236386.3243045.
  3. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. Hruschka, and T. Mitchell, 'Toward an architecture for never-ending language learning', in Proceedings of the AAAI Conference on Artificial Intelligence, 2010, vol. 24, no. 1.
  4. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, 'Freebase: a collaboratively created graph database for structuring human knowledge', in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada, Jun. 2008, pp. 1247–1250. doi: 10.1145/1376616.1376746.
  5. D. B. Lenat, 'CYC: a large-scale investment in knowledge infrastructure', Commun. ACM, vol. 38, no. 11, pp. 33–38, Nov. 1995, doi: 10.1145/219717.219745.
  6. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, 'Dbpedia: A nucleus for a web of open data', in The semantic web, Springer, 2007, pp. 722–735.
  7. F. M. Suchanek, G. Kasneci, and G. Weikum, 'Yago: a core of semantic knowledge', in Proceedings of the 16th international conference on World Wide Web, 2007, pp. 697–706.
  8. D. Vrandečić and M. Krötzsch, 'Wikidata: a free collaborative knowledgebase', Commun. ACM, vol. 57, no. 10, pp. 78–85, 2014.
  9. A. Bordes and E. Gabrilovich, 'Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial', in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, New York, USA, Aug. 2014, p. 1967. doi: 10.1145/2623330.2630803.
  10. S. G. Pillai, L.-K. Soon, and S.-C. Haw, 'Comparing DBpedia, Wikidata, and YAGO for web information retrieval', in Intelligent and Interactive Computing, Springer, 2019, pp. 525–535.
  11. H. Paulheim, 'Knowledge graph refinement: A survey of approaches and evaluation methods', Semantic Web, vol. 8, no. 3, pp. 489–508, 2017.
  12. D. Ringler and H. Paulheim, 'One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & co.', in Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz), 2017, pp. 366–372.
  13. C. T. Leondes, Ed., in Expert Systems: The Technology of Knowledge Management and Decision Making for the 21st Century, 1st edition., San Diego: Academic Press, 2001, pp. 1–22.
  14. P. Jackson, in Introduction to Expert Systems, 3rd ed., USA: Addison-Wesley Longman Publishing Co., Inc., 1998, p. 2.
  15. J. Cowie and W. Lehnert, ‘Information Extraction’, Commun ACM, vol. 39, no. 1, pp. 80–91, Jan. 1996, doi: 10.1145/234173.234209.
  16. M. Allahyari et al., 'A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques', ArXiv170702919 Cs, Jul. 2017, Accessed: Sep. 26, 2018. Online]. Available: http://arxiv.org/abs/1707.02919
  17. R. Leaman and G. Gonzalez, 'BANNER: an executable survey of advances in biomedical named entity recognition', in Biocomputing 2008, World Scientific, 2008, pp. 652–663.
  18. The Stanford Natural Language Processing Group'. https://nlp.stanford.edu/projects/coref.shtml (accessed Jun. 20, 2020).
  19. M. A. Hearst, 'Automatic acquisition of hyponyms from large text corpora', presented at the The 15th international conference on computational linguistics, 1992.
  20. M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni, 'Open information extraction from the web.', in Proceedings of the International Joint Conference on Artificial Intelligence, 2007, pp. 2670–2676.
  21. T. R. Gruber, 'Toward principles for the design of ontologies used for knowledge sharing?', Int. J. Hum.-Comput. Stud., vol. 43, no. 5–6, pp. 907–928, 1995.

Downloads

Published

2021-12-30

Issue

Section

Research Articles

How to Cite

[1]
Mohamed Minhaj, " Using Wikipedia's Big Data for creation of Knowledge Bases, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 7, Issue 6, pp.11-18, November-December-2021. Available at doi : https://doi.org/10.32628/CSEIT217546