Detecting of Alike Data for Information Recognition and Storing With Low Charges

Authors

  • K M Siva Krishna K Somasekhar  Department of MCA, RCR Institutes of Management & Technology, Tirupati, AP, India

Keywords:

Data Deduplication, Delta Compression, Storage System, Index Structure, Performance Evaluation

Abstract

Cloud computing greatly facilitates information suppliers who need to source their information to the cloud while not revealing their sensitive information to external parties and would love users with sure credentials to be ready to access the info. Data reduction has become more and more vital in storage systems because of the explosive growth of digital information within the world that has ushered within the huge information era. one amongst the most challenges facing large-scale information reduction is a way to maximally notice and eliminate redundancy at terribly low overheads. during this paper, we tend to gift DARE, a low-overhead Deduplication-Aware alikeness detection and Elimination theme that effectively exploits existing duplicate-adjacency info for extremely economical alikeness detection in information deduplication based mostly backup/archiving storage systems. the most plan behind DARE is to use a theme, decision Duplicate-Adjacency based mostly alikeness Detection (DupAdj), by considering any 2 information chunks to be similar (i.e., candidates for delta compression) if their several adjacent information chunks are duplicate in an exceedingly deduplication system, and so additional enhance the alikeness detection potency by Associate in Nursing improved super-feature approach. Our experimental results supported real-world and artificial backup datasets show that DARE solely consumes regarding 1/4 and 1/2 severally of the computation and categorization overheads needed by the normal super-feature approaches whereas police investigation 2-10% additional redundancy and achieving the next outturn, by exploiting existing duplicate-adjacency info for alikeness detection and finding the “sweet spot” for the super-feature approach.

References

  1. “The data deluge,” http://econ.st/fzkuDq.
  2. J Gantz and D. Reinsel, “Extracting value from chaos,” IDC review, pp. 1–12, 2011.
  3. M A. L. DuBois and E. Sheppard, “Key considerations as deduplication evolves into primary storage,” White Paper 223310, Mar 2011.
  4. W J. Bolosky, S. Corbin, D. Goebel, and et al, “Single instance storage in windows 2000,” in the 4th USENIX Windows Systems Symposium. Seattle,WA, USA: USENIX Association, August 2000, pp. 13–24.
  5. S Quinlan and S. Dorward, “Venti: a new approach to archival storage,” in USENIX Conference on File and Storage Technologies (FAST’02). Monterey, CA, USA: USENIX Association, January 2002, pp. 89–101.
  6. B Zhu, K. Li, and R. H. Patterson, “Avoiding the disk bottleneck in the data domain deduplication file system.” in the 6th USENIX Conference on File and Storage Technologies (FAST’08), vol. 8.
  7. San Jose, CA, USA: USENIX Association, February 2008, pp. 1–14.
  8. T Meyer and W. J. Bolosky, “A study of practical deduplication,” ACM Transactions on Storage (TOS), vol. 7, no. 4, p. 14, 2012.
  9. G Wallace, F. Douglis, H. Qian, and et al, “Characteristics of backup workloads in production systems,” in the Tenth USENIX
  10. Conference on File and Storage Technologies (FAST’12). San Jose, CA: USENIX Association, February 2012, pp. 33–48.
  11. A. El-Shimi, R. Kalach, A. Kumar, and et al, “Primary data deduplication-large scale study and system design,” in the 2012 conference on USENIX Annual Technical Conference. Boston, MA, USA: USENIX Association, June 2012, pp. 285–296.
  12. L. L. You, K. T. Pollack, and D. D. Long, “Deep store: An archival storage system architecture,” in the 21st International Conference on Data Engineering (ICDE’05). Tokyo, Japan: IEEE Computer Society Press, April 2005, pp. 804–815.
  13. A. Muthitacharoen, B. Chen, and D. Mazieres, “A low-bandwidth network file system,” in the ACM Symposium on Operating Systems Principles (SOSP’01). Banff, Canada: ACM Association, October 2001, pp. 1–14.
  14. P. Shilane, M. Huang, G. Wallace, and et al, “WAN optimized replication of backup datasets using stream-informed delta compression,” in the Tenth USENIX Conference on File and Storage Technologies (FAST’12). San Jose, CA, USA: USENIX Association, February 2012, pp. 49–64.
  15. S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu, “Vmflock: virtual machine co-migration for the cloud,” in the 20th international symposium on High Performance Distributed Computing, San Jose, CA, USA, June 2011, pp. 159–170.
  16. X. Zhang, Z. Huo, J. Ma, and et al, “Exploiting data deduplication to accelerate live virtual machine migration,” in 2010 IEEE International Conference on Cluster Computing (CLUSTER). Heraklion, Crete, Greece: IEEE Computer Society Press, September 2010, pp. 88–96.
  17. Douglis and A. Iyengar, “Application-specific delta-encoding via resemblance detection,” in USENIX Annual Technical Conference, General Track. San Antonio, TX, USA: USENIX Association, June 2003, pp. 113–126.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
K M Siva Krishna K Somasekhar, " Detecting of Alike Data for Information Recognition and Storing With Low Charges, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 4, pp.1044-1047, March-April-2018.