Substitution based DNA Sequences Compression-Encryption Method

Authors

  • Syed Mahamud Hossein  Department of Computer Science, Vidyasagar University, Midnapore, West Bengal, India

Keywords:

DNA, Compression, Encryption, Repeat, Palindrome, rate, ratio & Security.

Abstract

Now a day’s research is being carried out on minimizing the executing time and rate of lossless compression as DNA sequence size are increasing in large amount. The protection of DNA sequence database from hackers is a challenging question. To solve this question, a technique known as lossless DNA sequence compression is developed which is based on searching for exact Repeat and Palindrome (RP). One of the hidden characteristic of DNA sequence is approximate repeats, this feature-RP (Repeat & Palindrome) has been consider in this work. This algorithm can be used to minimize the storage capacity and reduce the cost of transmission. The DNA sequence compression is optimized by encoding exact repeats and palindromes in match position. There must not be overlapping of the repeat and palindrome technique in DNA sequence compression. In this technique after compression two files are produced compressed and library file. This library file act as a signature and provides security. The group of characters of repeat and palindrome technique are also act as a private key and provide strong data security. This algorithm attains the greater compression rate & ratio, compared to the prevailing DNA based compression techniques and provides the strong information security. The difference between cellular DNA and artificial sequence of same length is observed. The complexity of this algorithm is O(N2) where n is the set of characters. We can get compression rate of 3.2076 bits/base by using this technique.

References

  1. International nucleotide sequence database collaboration, (2013),Online]. Available: http://www.insdc.org.
  2. A.Jahaan, Dr. T. N. Ravi, Dr. S. Panneer Arokiaraj” Bit DNA Squeezer (BDNAS) : A Unique Technique for Dna Compression” International Journal of Scientific Research in Computer Science, Engineering and Information Technology,pp-512-517,2017
  3. Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa and Tadashi Imanishi, ‘Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences’ ,Bioinformatics, pp 1-3,2019
  4. Karsch-Mizrachi, I., Nakamura, Y., and Cochrane, G., 2012, the International Nucleotide Sequence Database Collaboration, Nucleic Acids Research, 40(1), 33–37.
  5. Deorowicz, S., and Grabowski, S., 2011, Robust relative compression of genomes with random access, Bioinformatics, 27(21), 2979–2986.
  6. Brooksbank, C., Cameron, G., and Thornton, J., 2010, The European Bioinformatics Institute’s data resources, Nucleic Acids Research, vol. 38, 17-25.
  7. Shumway, M., Cochrane, G., and Sugawara, H., 2010, Archiving next generation sequencing data, Nucleic Acids Research, vol. 38, 870-871.
  8. Kapushesky, M., Emam, I., Holloway, E., et al, 2010, Gene expression atlas at the European bioinformatics institute, Nucleic Acids Research, 38(1), 690-698.
  9. Jahaan A, Ravi TN, Panneer Arokiaraj S (2017) Bit DNA Squeezer (BDNAS): a unique technique for DNA compression. Int J Sci Res Comput Sci Eng Inf Technol 2:512–517
  10. Nour S. Bakr1, Amr A. Sahrawi, ‘DNA Lossless Compression Algorithms: Review ‘, American Journal of Bioinformatics Research, 2013 pp 72-81
  11. Nahida Habib, Kawsar Ahmed, Iffat Jabin and Mohammad Motiur Rahman, Modified HuffBit Compress Algorithm – An Application of R, Journal of Integrative Bioinformatics, pp 1-13. 2018
  12. Mr Deepak Harbola1 et al. State of the art: DNA Compression Algorithms, International Journal of Advanced Research in Computer Science and Software Engineering, 2013, pp 397-400.
  13. K. Kryukov, M. T. Ueda, S. Nakagawa, and T. Imanishi, ``Nucleotide archival format (NAF) enables efficient lossless reference-free compression of DNA sequences,'' Bioinformatics, vol. 35, no. 19, pp. 3826-3828,Oct. 2019.
  14. Matsumoto, T., Sadakane, K., and Imai, H., 2000, Biological Sequence Compression Algorithms, Genome Informatics 11: 43–52 (2000).
  15. Giancarlo, R., Scaturro, D., and Utro, F., 2009, Textual data compression in computational biology: a synopsis, Bioinformatics, 25(13), 1575–1586.
  16. Ozkan U. Nalbantoglu, David J. Russell and Khalid Sayood,Data Compression Concepts and Algorithms and their Applications to Bioinformatics, Entropy 2010, 12, 34-52; doi:10.3390/e12010034.
  17. Deloula Mansouri, Xiaohui Yuan and Abdeldjalil Saidani, A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme, Algorithms, pp 1-18,2020
  18. Tapasi Bhattacharjee and Santi P. Maity, An image-in-image communication scheme using secret sharing and M-ary spread spectrum watermarking, Microsystem Technologies, 2017, pp 4263–276
  19. Syed Mahamud Hossein et al., Comparison Of Compression Algorithm For DNA Sequences With Information Security Using Exact Matching Of Repeat, Reverse, Complement & Palindrome Technique On DNA Sequences and Apply On Others Orientation Also, International Journal of Information Technology & Management Information System,2013, pp 25-46

Downloads

Published

2022-08-30

Issue

Section

Research Articles

How to Cite

[1]
Syed Mahamud Hossein, " Substitution based DNA Sequences Compression-Encryption Method, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 8, Issue 4, pp.63-70, July-August-2022.