Study of Various Mechanisms Used in Data Deduplication in Cloud Storage System

Authors

  • Hema S  Research Scholar, Department of Computer Applications, Govt. Arts College (Autonomous), Salem-7, Tamil Nadu, India
  • Dr. Kangaiammal A  Assistant Professor, Department of Computer Applications, Govt. Arts College(Autonomous), Salem-7, Tamil Nadu, India

Keywords:

Deduplication, Chunking, Boundary Shift Problem, Convergent Encryption, Cloud Storage Optimization

Abstract

Cloud computing is an emerging concept that provide different services such as computing, communication and storage resources on demand over the internet. Data deduplication is one of the mainly used techniques in cloud storage, which removes redundant data; reduce network bandwidth and storage utilization. In this paper, the concepts and types of chunk based data deduplication techniques are summarized and also how chunks are uniquely identified by hashing process is discussed.

References

  1. D. Irwin, L. Grit, J. Chas, Balancing risk and reward in a market-based task service, in: 13th International Symposium on High Performance, Distributed Computing (HPDC13), June 2004, pp. 160-169.
  2. C. Yeo, R. Buyya, Service level agreement based allocation of cluster resources: handling penalty to enhance utility, in: 7th IEEE International Conference on Cluster Computing (Cluster 2005), September 2005.
  3. Amazon Elastic Compute Cloud (EC2), http://www.amazon.com/ec2/ .9 Nov 2017]
  4. Google App Engine, http://appengine.google.com .9 Nov 2017]
  5. Sun network.com (Sun Grid), http://www.network.com .9 Nov 2017]
  6. S. Quinlan and S. Dorward, Venti: A new approach to archival data storage, in Proceedings of the 1st USENIX Conference on File and Storage Technologies, 2002.
  7. National Institute of Standards and Technology, FIPSPUB 180-1: Secure hash Standards, Technical Report, 1995.
  8. R. Rivest, The md5 message-digest algorithm, http://www.ietf.org/rfc/rfc1321.txt, 1992.
  9. B. Debnath, S. Sengupta, and J. Li, Chunkstash: Speeding up inline storage deduplication using flash memory, in Proceedings of the Annual Conference on USENIX Annual Technical Conference, 2010.
  10. E. Kruus, C. Ungureanu, and C. Dubnicki, Bimodal content defined chunking for backup streams, in Proceedings of the 8th USENIX Conference on File and Storage Technologies, 2010.
  11. K. Eshghi and H. K. Tang, A framework for analysing and improving content-based chunking algorithms, Tech. Rep. HPL-2005-30(RI), 2005.
  12. WikiPedia-online] Available:http://www.wikipedia.com/deduplication 3 nov 2017]
  13. Q. He, Z. Li, and X. Zhang, ―Data deduplication techniques,‖ in Future Information Technology and Management Engineering (FITME), 2010 International Conference on, vol. 1, 2010, pp. 430-433.
  14. Bolosky WJ,Corbin S,Goebel D,Douceur JR.Single instance storage in Windows 2000.In:Proc.of the 4th Usenix Windows System Symp.Berkeley: USENIX Association,2000. 13-24.
  15. S. Quinlan and S. Dorward, “Venti: a new approach to archival storage,” in Proceedings of USENIX Conference on File and Storage Technologies (FAST’02). Monterey, CA, USA: USENIX Association, January 2002, pp. 1-13.
  16. Liu C, Lu Y, Shi C, Lu G, Du DH, Wang DS. ADMAD: Application-driven metadata aware de-duplication archival storage system. InStorage Network Architecture and Parallel I/Os, 2008. SNAPI'08. Fifth IEEE International Workshop on 2008 Sep 22 (pp. 29-35). IEEE
  17. J. Douceur, A. Adya, W. Bolosky, D. Simon, and M. Theimer. “Reclaiming space from duplicate files in a serverless distributed file system. In Distributed Computing Systems”, 2002. Proceedings. 22nd International Conference on, pages 617{624. IEEE, 2002.
  18. Shai Halevi , Danny Harnik , Benny Pinkas , Alexandra Shulman-Peleg, Proofs of ownership in remote storage systems, Proceedings of the 18th ACM conference on Computer and communications security, October 17-21, 2011, Chicago, Illinois, USA 
  19. Z. Yan, W. Ding, X. Yu, H. Zhu and R. H. Deng, "Deduplication on Encrypted Big Data in Cloud," in IEEE Transactions on Big Data, vol. 2, no. 2, pp. 138-150, June 1 2016.
  20. A. Muthitacharoen, B. Chen, and D. Mazieres, A low-bandwidth network file system. in Symposium on Operating Systems Principles, 2001, page 174-187, 2001.
  21. Rabin M (1981) Fingerprinting by random polynomials. Center for Research in Computing Technology, Aiken Computation Laboratory, University.
  22. Mogul J, Douglis F, Feldmann A, Krishnamurthy B (1997) Potential benefits of delta encoding and data compression for HTTP. In: Proceedings of ACM SIGCOMM’97 conference, pp 181- 194, Sept 1997
  23. X. Zhang, M. Deng, An Overview on Data Deduplication Techniques, Cham:Springer International Publishing, pp. 359-369, 2017.
  24. Venish, A., and K. Siva Sankar. "Study of Chunking Algorithm in Data Deduplication." Proceedings of the International Conference on Soft Computing Systems. Springer, New Delhi, 2016

Downloads

Published

2018-02-28

Issue

Section

Research Articles

How to Cite

[1]
Hema S, Dr. Kangaiammal A, " Study of Various Mechanisms Used in Data Deduplication in Cloud Storage System, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 1, pp.277-282, January-February-2018.