A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval

Authors

  • Soumya George  Research Scholar, Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala, India
  • M. Sudheep Elayidom  Associate Professor, Division of Computer Engineering, Cochin University of Science and Technology, Kochi, Kerala, India
  • T. Santhanakrishnan  Scientist, Govt. of India, Ministry of Defence, Naval Physical and Oceanographic Laboratory, Kochi, Kerala, India

Keywords:

Search engine, Stop words, Graph database, Word Sequence Graph Model

Abstract

Long tail queries or keywords are becoming the norm for the user to search for what they intend and relying on keyword based SEO tactics never wins the game. A full text sequence based indexing approach for the document is needed to manage these lengthy search queries. This paper presents a highly efficient and novel graph based document representation, Word Sequence Graph model, to enhance text search and retrieval of any length including stop words by exploiting the unique features of a graph database. It is a one-for-all model where document and content information lies at the same place. This methodology is of high relevance in many real world applications that includes searching huge collection of documents. The examples are demonstrated with the help of bible texts.

References

  1. Rao, B., & Mishra, S. N. (2017). An Approach to Text Documents Clustering with {n, n-1,….., 1}-Word (s) Appearance Using Graph Mining Techniques. IJSEAT, 4(12), 756-762.
  2. Ravinuthala,M. K. V.& Ch, S. R (2016). Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System. International Journal of Information Engineering and Electronic Business(IJIEEB), 8(4), 18.
  3. Murtaza Munawar Fazal and Muhammad Rafi (2014).  Clustering textual documents by extracting sequence from word-of-graph. Journal of Independent Studies and Research – Computing Volume 12 Issue 1
  4. S. S. Sonawane, and Dr. P.A. Kulkarni (2014). Graph based Representation and Analysis of Text Document : A Survey of Techniques. vol. 96, no. 19, pp. 1–8.
  5. Hammouda, K. M., & Kamel, M. S. (2004). Efficient phrase-based document indexing for web document clustering. IEEE Transactions on knowledge and data engineering, 16(10), 1279-1296.
  6. Pfaffe, P., Tillmann, M., Lutteropp, S., Scheirle, B., & Zerr, K. (2016). Parallel String Matching.
  7. Rolston, L., & Kirchhoff, K. (2016). Collection of Bilingual Data for Lexicon Transfer Learning.
  8. Hewitt, J., Post, M., & Yarowsky, D. (2016). Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages. AMTA 2016, Vol., 177.
  9. Wolf, L., Hanani, Y., Bar, K., & Dershowitz, N. (2014). Joint word2vec networks for bilingual semantic representations. International Journal of Computational Linguistics and Applications, 5(1), 27-44.
  10. Rani, A., Goyal, N., & Gadia, S. K. (2016, October). Efficient Multi-depth Querying on Provenance of Relational Queries Using Graph Database. In Proceedings of the 9th Annual ACM India Conference (pp. 11-20). ACM

Downloads

Published

2017-10-31

Issue

Section

Research Articles

How to Cite

[1]
Soumya George, M. Sudheep Elayidom, T. Santhanakrishnan, " A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.108-113, September-October-2017.