A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval

Soumya George; M. Sudheep Elayidom; T. Santhanakrishnan

doi:10.32628/CSEIT1724199

Authors

Soumya George Research Scholar, Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala, India
M. Sudheep Elayidom Associate Professor, Division of Computer Engineering, Cochin University of Science and Technology, Kochi, Kerala, India
T. Santhanakrishnan Scientist, Govt. of India, Ministry of Defence, Naval Physical and Oceanographic Laboratory, Kochi, Kerala, India

Keywords:

Search engine, Stop words, Graph database, Word Sequence Graph Model

Abstract

Long tail queries or keywords are becoming the norm for the user to search for what they intend and relying on keyword based SEO tactics never wins the game. A full text sequence based indexing approach for the document is needed to manage these lengthy search queries. This paper presents a highly efficient and novel graph based document representation, Word Sequence Graph model, to enhance text search and retrieval of any length including stop words by exploiting the unique features of a graph database. It is a one-for-all model where document and content information lies at the same place. This methodology is of high relevance in many real world applications that includes searching huge collection of documents. The examples are demonstrated with the help of bible texts.

References

Rao, B., & Mishra, S. N. (2017). An Approach to Text Documents Clustering with {n, n-1,….., 1}-Word (s) Appearance Using Graph Mining Techniques. IJSEAT, 4(12), 756-762.
Ravinuthala,M. K. V.& Ch, S. R (2016). Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System. International Journal of Information Engineering and Electronic Business(IJIEEB), 8(4), 18.
Murtaza Munawar Fazal and Muhammad Rafi (2014). Clustering textual documents by extracting sequence from word-of-graph. Journal of Independent Studies and Research – Computing Volume 12 Issue 1
S. S. Sonawane, and Dr. P.A. Kulkarni (2014). Graph based Representation and Analysis of Text Document : A Survey of Techniques. vol. 96, no. 19, pp. 1–8.
Hammouda, K. M., & Kamel, M. S. (2004). Efficient phrase-based document indexing for web document clustering. IEEE Transactions on knowledge and data engineering, 16(10), 1279-1296.
Pfaffe, P., Tillmann, M., Lutteropp, S., Scheirle, B., & Zerr, K. (2016). Parallel String Matching.
Rolston, L., & Kirchhoff, K. (2016). Collection of Bilingual Data for Lexicon Transfer Learning.
Hewitt, J., Post, M., & Yarowsky, D. (2016). Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages. AMTA 2016, Vol., 177.
Wolf, L., Hanani, Y., Bar, K., & Dershowitz, N. (2014). Joint word2vec networks for bilingual semantic representations. International Journal of Computational Linguistics and Applications, 5(1), 27-44.
Rani, A., Goyal, N., & Gadia, S. K. (2016, October). Efficient Multi-depth Querying on Provenance of Relational Queries Using Graph Database. In Proceedings of the 9th Annual ACM India Conference (pp. 11-20). ACM

A Novel Sequence Graph Representation for Searching and Retrieving Sequences of Long Text in the Domain of Information Retrieval

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite