Realistic and Efficient Selection Scheme for Huge Scale De-Duplication

Authors(2) :-Megha Rani Raigond, Vijaylaxmi

The data de-duplication work has attracted a substantial quantity amount of observation from the analysis community to provide effectual and economical solutions. Duplicate data means same data stored in database. The data given by an operator to tune the de-duplication methods generally indicated by a collection of manually labelled pair. The domain is ns2.In the existing system we are sending the packets from source to destination while sending the packets it does not check the all nodes and here we are not giving the node id for each node. So duplicate packets will send to destination. In addition, in this While sending the packets from source to destination, we have to give the node id to each node for security purpose. The algorithm will check all the nodes such as which node does not contain duplicate packets. Finally, Algorithm will find the shortest path to send de-duplicate packets from source to destination. In this we conclude, de-duplicate packets will reaches to destination by using the shortest path.

Authors and Affiliations

Megha Rani Raigond
Department of Master of Computer Application (MCA), VTU PG Centre Kalaburagi, Karnataka, India
Department of Master of Computer Application (MCA), VTU PG Centre Kalaburagi, Karnataka, India

De-Duplication, Signature-Based De-Duplication

  1. A. Arasu, M. Gotz, and R. Kaushik, "On active learning of record matching packages," in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 783-794.
  2. A. Arasu, C. R e, and D. Suciu, "Large-scale deduplication with constraints using dedupalog," in Proc. IEEE Int. Conf. Data Eng., 2009, pp. 952-963.
  3. R. J. Bayardo, Y. Ma, and R. Srikant, "Scaling up all pairs similarity search," in Proc. 16th Int. Conf. World Wide Web, pp. 131-140, 2007.
  4. A. Beygelzimer, S. Dasgupta, and J. Langford, "Importance weighted active learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., pp. 49-56, 2009.
  5. M. Bilenko and R. J. Mooney, "On evaluation and training-set construction for duplicate detection," in Proc. Workshop KDD, 2003, pp. 7-12.
  6. S. Chaudhuri, V. Ganti, and R. Kaushik, "A primitive operator for similarity joins in data cleaning," in Proc. 22nd Int. Conf. Data Eng., p. 5, Apr. 2006.
  7. P. Christen, "A survey of indexing techniques for scalable record linkage and deduplication," IEEE Trans. Knowl. Data Eng., vol. 24, no. 9, pp. 1537-1555, Sep. 2012.
  8. D. Cohn, L. Atlas, and R. Ladner, "Improving generalization with active learning," Mach. Learn., vol. 15, no. 2, pp. 201-221, 1994.
  9. G. Dal Bianco, R. Galante, C. A. Heuser, and M. A. Gonalves, "Tuning large scale deduplication with reduced effort," in Proc. 25th Int. Conf. Scienti?c Statist. Database Manage., 2013, pp. 1-12.
  10. M. G. de Carvalho, A. H. Laender, M. A. Goncalves, and A. S. da Silva, "A genetic programming approach to record deduplication," IEEE Trans. Knowl. Data Eng., vol. 24, no. 3, pp. 399-412, Mar. 2012.

Publication Details

Published in : Volume 2 | Issue 4 | July-August 2017
Date of Publication : 2017-08-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 340-343
Manuscript Number : CSEIT172487
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Megha Rani Raigond, Vijaylaxmi, "Realistic and Efficient Selection Scheme for Huge Scale De-Duplication ", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.340-343, July-August-2017.
Journal URL :

Article Preview