Realistic and Efficient Selection Scheme for Huge Scale De-Duplication

Authors

  • Megha Rani Raigond  Department of Master of Computer Application (MCA), VTU PG Centre Kalaburagi, Karnataka, India
  • Vijaylaxmi  Department of Master of Computer Application (MCA), VTU PG Centre Kalaburagi, Karnataka, India

Keywords:

De-Duplication, Signature-Based De-Duplication

Abstract

The data de-duplication work has attracted a substantial quantity amount of observation from the analysis community to provide effectual and economical solutions. Duplicate data means same data stored in database. The data given by an operator to tune the de-duplication methods generally indicated by a collection of manually labelled pair. The domain is ns2.In the existing system we are sending the packets from source to destination while sending the packets it does not check the all nodes and here we are not giving the node id for each node. So duplicate packets will send to destination. In addition, in this While sending the packets from source to destination, we have to give the node id to each node for security purpose. The algorithm will check all the nodes such as which node does not contain duplicate packets. Finally, Algorithm will find the shortest path to send de-duplicate packets from source to destination. In this we conclude, de-duplicate packets will reaches to destination by using the shortest path.

References

  1. A. Arasu, M. Gotz, and R. Kaushik, "On active learning of record matching packages," in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 783-794.
  2. A. Arasu, C. R e, and D. Suciu, "Large-scale deduplication with constraints using dedupalog," in Proc. IEEE Int. Conf. Data Eng., 2009, pp. 952-963.
  3. R. J. Bayardo, Y. Ma, and R. Srikant, "Scaling up all pairs similarity search," in Proc. 16th Int. Conf. World Wide Web, pp. 131-140, 2007.
  4. A. Beygelzimer, S. Dasgupta, and J. Langford, "Importance weighted active learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., pp. 49-56, 2009.
  5. M. Bilenko and R. J. Mooney, "On evaluation and training-set construction for duplicate detection," in Proc. Workshop KDD, 2003, pp. 7-12.
  6. S. Chaudhuri, V. Ganti, and R. Kaushik, "A primitive operator for similarity joins in data cleaning," in Proc. 22nd Int. Conf. Data Eng., p. 5, Apr. 2006.
  7. P. Christen, "A survey of indexing techniques for scalable record linkage and deduplication," IEEE Trans. Knowl. Data Eng., vol. 24, no. 9, pp. 1537-1555, Sep. 2012.
  8. D. Cohn, L. Atlas, and R. Ladner, "Improving generalization with active learning," Mach. Learn., vol. 15, no. 2, pp. 201-221, 1994.
  9. G. Dal Bianco, R. Galante, C. A. Heuser, and M. A. Gonalves, "Tuning large scale deduplication with reduced effort," in Proc. 25th Int. Conf. Scientiļ¬c Statist. Database Manage., 2013, pp. 1-12.
  10. M. G. de Carvalho, A. H. Laender, M. A. Goncalves, and A. S. da Silva, "A genetic programming approach to record deduplication," IEEE Trans. Knowl. Data Eng., vol. 24, no. 3, pp. 399-412, Mar. 2012.

Downloads

Published

2017-08-31

Issue

Section

Research Articles

How to Cite

[1]
Megha Rani Raigond, Vijaylaxmi, " Realistic and Efficient Selection Scheme for Huge Scale De-Duplication , IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.340-343, July-August-2017.