A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval

Authors(3) :-Soumya George, M. Sudheep Elayidom, T. Santhanakrishnan

Long tail queries or keywords are becoming the norm for the user to search for what they intend and relying on keyword based SEO tactics never wins the game. A full text sequence based indexing approach for the document is needed to manage these lengthy search queries. This paper presents a highly efficient and novel graph based document representation, Word Sequence Graph model, to enhance text search and retrieval of any length including stop words by exploiting the unique features of a graph database. It is a one-for-all model where document and content information lies at the same place. This methodology is of high relevance in many real world applications that includes searching huge collection of documents. The examples are demonstrated with the help of bible texts.

Authors and Affiliations

Soumya George
Research Scholar, Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala, India
M. Sudheep Elayidom
Associate Professor, Division of Computer Engineering, Cochin University of Science and Technology, Kochi, Kerala, India
T. Santhanakrishnan
Scientist, Govt. of India, Ministry of Defence, Naval Physical and Oceanographic Laboratory, Kochi, Kerala, India

Search engine, Stop words, Graph database, Word Sequence Graph Model

  1. Rao, B., & Mishra, S. N. (2017). An Approach to Text Documents Clustering with {n, n-1,.., 1}-Word (s) Appearance Using Graph Mining Techniques. IJSEAT, 4(12), 756-762.
  2. Ravinuthala,M. K. V.& Ch, S. R (2016). Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System. International Journal of Information Engineering and Electronic Business(IJIEEB), 8(4), 18.
  3. Murtaza Munawar Fazal and Muhammad Rafi (2014). Clustering textual documents by extracting sequence from word-of-graph. Journal of Independent Studies and Research Computing Volume 12 Issue 1
  4. S. S. Sonawane, and Dr. P.A. Kulkarni (2014). Graph based Representation and Analysis of Text Document?: A Survey of Techniques. vol. 96, no. 19, pp. 18.
  5. Hammouda, K. M., & Kamel, M. S. (2004). Efficient phrase-based document indexing for web document clustering. IEEE Transactions on knowledge and data engineering, 16(10), 1279-1296.
  6. Pfaffe, P., Tillmann, M., Lutteropp, S., Scheirle, B., & Zerr, K. (2016). Parallel String Matching.
  7. Rolston, L., & Kirchhoff, K. (2016). Collection of Bilingual Data for Lexicon Transfer Learning.
  8. Hewitt, J., Post, M., & Yarowsky, D. (2016). Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages. AMTA 2016, Vol., 177.
  9. Wolf, L., Hanani, Y., Bar, K., & Dershowitz, N. (2014). Joint word2vec networks for bilingual semantic representations. International Journal of Computational Linguistics and Applications, 5(1), 27-44.
  10. Rani, A., Goyal, N., & Gadia, S. K. (2016, October). Efficient Multi-depth Querying on Provenance of Relational Queries Using Graph Database. In Proceedings of the 9th Annual ACM India Conference (pp. 11-20). ACM

Publication Details

Published in : Volume 2 | Issue 5 | September-October 2017
Date of Publication : 2017-10-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 108-113
Manuscript Number : CSEIT1724199
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Soumya George, M. Sudheep Elayidom, T. Santhanakrishnan, "A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.108-113, September-October-2017.
Journal URL : http://ijsrcseit.com/CSEIT1724199

Article Preview