A Deduplication-Aware Likeness Finding and Evacuation Framework for Information Reduce with Little Consumption

B. V. R . Narasimha

doi:10.32628/CSEIT1833430

Authors

B. V. R . Narasimha Mca Sri Padmavathi College of Computer Sciences and Technology Tiruchanoor, Andhra Pradesh, India

Keywords:

Data Deduplication, Delta Compression, Storage System, Index Structure, Performance Evaluation.

Abstract

Data reduction has become progressively vital in storage systems because of the explosive growth of digital information within the world that has ushered within the huge information era. In existing system cloud suppliers give less process capability and therefore displease their users for poor service quality. If the provided computing capability is giant enough (i.e., several servers area unit under-utilized), this may lead to tremendous quantity of energy waste with vast price and therefore reduces the profit of the cloud supplier. Therefore, it's vital for a cloud supplier to pick out acceptable servers to supply services, such it reduces price the maximum amount as doable whereas satisfying its users at an equivalent time. during this state of affairs the cloud suppliers doesn't taken into consideration whether or not the info is duplicated or not. If the user information is duplicated suggests that it takes longer to method and server time is additionally wasted. Here the most drawback duplication therefore to beat of these issues we tend to opt for projected model. In this paper, we tend to gift DARE, a low-overhead Deduplication-Aware alikeness detection and Elimination theme that effectively exploits existing duplicate-adjacency info for extremely economical alikeness detection in information deduplication primarily based backup/archiving storage systems. the most theme of DARE is to use a theme, decision Duplicate-Adjacency primarily based alikeness Detection (Dup Adj), by considering any 2 information chunks that area unit similar (i.e., candidates for delta compression) if their various adjacent information chunks area unit duplicate during a deduplication system then we tend to use super feature approach for any enhance the alikeness detection for prime potency. Our experimental results and backup datasets show that DARE solely consumes concerning 1/4 and 1/2 severally of the computation and assortment overheads needed by the normal super-feature approaches whereas police investigation 2-10% a lot of redundancy and achieving the next outturn, by exploiting existing duplicate-adjacency info for alikeness detection and finding the “sweet spot” for the super-feature approach.

References

B. Zhu, K. Li, and R. H. Patterson, "Avoiding the disk bottleneck in the data domain deduplication file system," in Proc. 6th USENIX Conf. File Storage Technol., Feb. 2008, vol. 8, pp. 1-14.
D. T. Meyer and W. J. Bolosky, "A study of practical deduplication," ACM Trans. Storage, vol. 7, no. 4, p. 14, 2012.
G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu, "Characteristics of backup workloads in production systems," in Proc. 10th USENIX Conf. File Storage Technol., Feb. 2012, pp. 33-48.
A. El-Shimi, R. Kalach, A. Kumar, A. Ottean, J. Li, and S. Sengupta, "Primary data deduplication large scale study and system design," in Proc. Conf. USENIX Annu. Tech. Conf., Jun. 2012, pp. 285- 296.
L. L. You, K. T. Pollack, and D. D. Long, "Deep store: An archival storage system architecture," in Proc. 21st Int. Conf. Data Eng., Apr. 2005, pp. 804-815.
A. Muthitacharoen, B. Chen, and D. Mazieres, "A low-bandwidth network file system," in Proc. ACM Symp. Oper. Syst. Principles. Oct. 2001, pp. 1-14.
N. Agrawal, W. Bolosky, J. Douceur, and J. Lorch. A five-year study of file-system metadata. In FAST’07: Proceedings of 5th Conference on File and Storage Technologies, pages 31-45, February 2007. [2] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a distributed file system. In Proceedings of the Thirteenth Symposium on Operating Systems Principles, Oct. 1991.
W. Hsu and A. J. Smith. Characteristics of I/O traffic in personal computer and server workloads. IBM Systems Journal, 42:347-372, April 2003.
IDC. Worldwide purpose-built backup appliance 2011-2015 forecast and 2010 vendor shares, 2011. [17] E. Kruus, C. Ungureanu, and C. Dubnicki. Bimodal content defined chunking for backup streams. In FAST’10: Proceedings of the 8th Conference on File and Storage Technologies, February 2010.
P. Kulkarni, F. Douglis, J. LaVoie, and J. M. Tracey. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference, pages 59-72, 2004.
D. A. Lelewer and D. S. Hirschberg. Data compression. ACM Computing Surveys, 19:261-296, 1987. [20] A. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In Proceedings of the 2008 USENIX Technical Conference, June 2008.
J. Bennett, M. Bauer, and D. Kinchlea. Characteristics of files in NFS environments. In SIGSMALL’91: Proceedings of 1991 Symposium on Small Systems, June 1991.

A Deduplication-Aware Likeness Finding and Evacuation Framework for Information Reduce with Little Consumption

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite