Improving Information Retrieval Performance

Abhay Dwivedi; Ankit Maurya; Sandhya Rawat

doi:10.32628/CSEIT228515

Authors

Abhay Dwivedi Department of BCA, Shri L.B.S. Degree College, Gonda, Uttar Pradesh, India
Ankit Maurya Department of Mathematics, Shri L.B.S. Degree College, Gonda, Uttar Pradesh, India
Sandhya Rawat I.E.T Dr. R.M.L.A. University Ayodhya, Uttar Pradesh, India

Keywords:

Information Retrival (IR), meta search engine, rank aggregation

Abstract

Locating interesting information is one of the most important tasks in Information Retrieval (IR). An IR system accepts a query from a user and responds with a set of documents. Generally, the system returns both relevant and non-relevant material and a document organization approach are applied to assist the user in finding the relevant information in the retrieved set. The two most widely used document organization approaches are the ranked list and clustering of the retrieved documents. Both these techniques have their strengths and weaknesses. This paper addresses the problem of offering scalable, adaptive, efficient, full-fledged information retrieval method. We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques for providing robustness of search in the context of web.

References

C.D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval, February 2008. Draft, Cambridge University Press.
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. Modern Information Retrieval. ACM Press /Addison-Wesley,1999.
Bush, V. (1945). As We May Think. Atlantic Monthly. Vol. 176(1), 101-108.
Cleverdon, C.W. (1967). The Cranfield Tests on Indexing Language Devices. Aslib Proceedings, 19, 173-92.
Salton, G. (1968). Automatic Information Organisation and Retrieval. New York: McGraw-Hill.
(Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. Reading, MA., Addison-Wesley.
Turtle, H. & Croft, W.B. (1990). Inference Networks for Information Retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference, Brussels, Belgium, pp. 1-24.
Robertson, S.E. & Sparck Jones, K. (1976). Relevance Weighting of Search Terms. Journal of the American Society for Information Sciences. 27(3): 129-46.
Salton, G. (1971). The SMART Retrieval System. Englewood Cliffs, NJ: Prentice Hall.
Harman, D. (1993). Overview of the First TREC Conference. Proceedings of ACM-SIGIR-93 Conference. Pittsburgh, PA. 36-47.
Fellbaum, C. (Ed). (1998). WordNet: An Electronic Lexical Database. Cambridge, MA. MIT Press.)
Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, ISBN: 0-201-39829-X.
Hiemstra, D. (2000). Using Language Models for Information Retrieval. Enschede, The Netherlands, Neslia Paniculata.
Liddy, E.D. (1998). Enhanced Text Retrieval Using Natural Language Processing. Bulletin of the American Society for Information Science. Vol 24, No. 4.
Liddy, E.D., Paik, W., McKenna, M. & Yu, E.S. (1995). A natural language text retrieval system with relevance feedback. Proceedings of the 16th National Online Meeting.
Liu, X. & Croft, W.B. (2004). Statistical Language Modeling for Information Retrieval. In Cronin, B. (Ed.). Annual Review of Information Science & Technology. Vol.
Ponte, J. & Croft, W.B. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of the 21st ACM Conference on Research and Development in Information Retrieval.
Robertson, S.E., Walker, S. & Beaulieu, M. (1998). Okapi at TREC-7. In Seventh Text REtrieval Conference (TREC-7), Gaithersburg, MD.
Salton, G., Fox, E. & Wu, H. (1983). Extended Boolean Information Retrieval. Communications of the ACM. 26(11). 1022-36.
Sparck Jones, K. & Willett, P. (Eds). (1997). Readings in Information Retrieval. San Francisco, Morgan Kaufmann Publishers.
Strzalkowski, T., Tzukermann, E. & Klavans, J. (2002). Information Retrieval and Natural Language Processing. In Mitkov, R. (Ed.), Handbook of Computational Linguistics, Oxford University Press.
Estivill-Castro, V. and Yang, J. A Fast and robust general purpose clustering algorithm. Paciﬁc Rim International Conference on Artiﬁcial Intelligence, pp. 208-218, 2000.
Fraley C. and Raftery A.E., “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis”, Technical Report No. 329. Department of Statistics University of Washington, 1998.
Dhillon I. and Modha D., Concept Decomposition for Large Sparse Text Data Using Clustering. Machine Learning. 42, pp.143-175. (2001).
Fisher, D., 1987, Knowledge acquisition via incremental conceptual clustering, in machine learning 2, pp. 139-172.
Dempster A.P., Laird N.M., and Rubin D.B., Maximum likelihood from incomplete data using the EM algorithm. Journal of the Royal Statistical Society, 39(B), 1977.
Huang, Z., Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 1998.
Jain, A.K. Murty, M.N. and Flynn, P.J. Data Clustering: A Survey. ACM Computing Surveys, Vol. 31, No. 3, September 1999.
Zahn, C. T., Graph-theoretical methods for detecting and describing gestalt clusters. IEEE trans. Comput. C-20 (Apr.), 68-86, 1971.
Han, J. and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.
Banﬁeld J. D. and Raftery A. E. . Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803-821, 1993.
J. I. Marden. Analyzing and Modeling Rank Data. Monographs on Statistics and Applied Probability, No 64, Chapman & Hall, 1995.
Meng, W., Yu, C., & Liu, K.-L., Building efficient and effective metasearch engines. ACMComputing Surveys, 2001, 34(1), 48–89.
Aslam, J. A., Montague, M., Models for metasearch. In: Proceedings of the 24th ACMSIGIR conference (pp. 276– 284), 2001.
Cynthia Dwork, Ravi Kumar, Moni Naor, D Siva Kumar, Rank Aggregation Methods for the web. In proceedings of the Tenth World Wide Web Conference, 2001.
Baeza-Yates, R., & Ribeiro-Neto, B., Modern information retrieval. New York: ACM Press, 2010.
Amitay, E., Carmel, D., Lempel, R., & So.er, A., Scaling IR-system evaluation using term relevance sets. In Proceedings of the 27th ACMSIGIR conference, 2004, pp. 10–17.
Soboro., I., Nicholas, C., & Cahan, P. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th ACM SIGIR conference, 2001, pp. 66–73.
Croft, W. B., Combining approaches to information retrieval. In W. B. Croft (Ed.), Advances in information retrieval: recent research from the center for intelligent information retrieval. Kluwer Academic Publishers, 2000.
Cynthia Dwork, Ravi Kumar, Moni Naor, D Siva Kumar, Rank Aggregation Methods for the web. In proceedings of the Tenth World Wide Web Conference, 2010.
Fan, W., Fox, E. A., Pathak, P., & Wu, H. The effects of fitness functions on generic programming-based ranking discovery for Web search. Journal of the American Society for Information Science and Technology, 55(7), 2004, 628–636.
Hawking, D., Craswel, N., Bailey, P., & Gri.ths, K., Measuring search engine quality. Information Retrieval, 4(1), 2001, 33–59.
Nuray, R., & Can, F., Automatic ranking of retrieval systems in imperfect environments. In Proceedings of the 26th ACM SIGIR conference 2009, pp. 379–380.
S. Brin and l. page, "The anatomy of a large-scale hypertextual web search engine", in proceedings of the 7th world wide web conference: pp. 107-117, 199
Adi Wahyu Pribadi, Zaenal Arifin Hasibuan, 2003, “Implementing Inference Network For Information Retrieval System In Indonesian Language”. Conference: iiWAS'200

Improving Information Retrieval Performance

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite