Big Data Anonymization in Cloud using k-Anonymity Algorithm using Map Reduce Framework

Authors(2) :-Anushree Raj, Rio G L D'Souza

Anonymization techniques are enforced to provide privacy protection for the data published on cloud. These techniques include various algorithms to generalize or suppress the data. Top Down Specification in k anonymity is the best generalization algorithm for data anonymization. As the data increases on cloud, data analysis becomes very tedious. Map reduce framework can be adapted to process on these huge amount of Big Data. We implement generalized method using Map phase and Reduce Phase for data anonymization on cloud in two different phases of Top Down Specification

Authors and Affiliations

Anushree Raj
Department of M.Sc. Big Data Analytics, St Agnes Autonomous College, Mangalore, Karnataka, India
Rio G L D'Souza
Department of Computer Science and Engineering, St Joseph Engineering College, Mangalore, Karnataka, India

Anonymization, Big Data in cloud, k-Anonymity, Map Reduce, Privacy Preserving

  1. L. Hsiao-Ying and W.G. Tzeng, “A Secure Erasure Code-Based Cloud Storage System with Secure Data Forwarding,” IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 6, pp. 9951003, 2012.
  2. D. Zissis and D. Lekkas, "Addressing Cloud Computing Security Issues," Future Generation Computer Systems, vol. 28, no. 3,pp. 583-592,2011.
  3. RuilinLiu, Hui Wang, ”Privacy –Preserving Data Publishing “ IEEE ,2010.
  4. B.C.M. Fung, K. Wang, and P.S. Yu, "Anonymizing Classification Data for Privacy Preservation," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 5, pp. 711-725, May 2007.
  5. X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation." Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB'06), pp. 139-150,2006.
  6. K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain K-Anonymity," Proc. ACM SIGMOD Infl Conf. Management of Data (SIGMOD '05), pp. 49-60, 2005.
  7. K. LeFevre, D. l DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-Anonymity," Proc. 22nd Int'! Conf. Data Eng. (ICDE '06), 2006.
  8. 1 Xu, W. Wang, 1 Pei, X. Wang, B. Shi, and A. W. Fu. , "Utility-based anonymization using local recoding ", In ACM SIGKDD, 2006.
  9. S.Yu, ”Anonymizing Classification Data for Privacy Preservation“. IEEE Transactions on Knowledge and Data Engineering ,vol 19 no 5 ,2007
  10. Benjamin C.M Fung, Ke Wang, Philip S.Yu “Top Down Specialization for Information and Privacy Preservation”.
  11. Bayardo R and Agrawal R, Data privacy through optimal k-anonymization. In ICDE05: The 21st International Conference on Data Engineering, pages 217– 228, 2005.
  12. HIPAA 2012 “k-anonymity : A model for Protecting Privacy “ International Journal Uncertain Fuzz .vol 10 ,no,5 pp 557-570 ,2002.
  13. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati, "k-Anonymity Data Mining: A Survey", Springer US, Advances in Information Security (2007)
  14. Latanya Sweeney, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, "Achieving k-anonymity privacy protection using generalization and suppression", International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Volume 10 Issue 5, October 2002, Pages 571 - 588
  15. Meyerson A and Williams R, On the complexity of optimal k-anonymity. In PODS04: Proceedings of the twenty fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 223–228, 2004
  16. Ke Wang, Philip S. Yu, Sourav Chakraborty, "Bottom-Up Generalization: A Data Mining Solution to Privacy Protection", Fourth IEEE International Conference on Data Mining, 2004. ICDM '04 .. Pages 249 – 256
  17. Xuyun Z, Laurence T Yang ,”A scalable Two phase Top Down Specialization Approach for Data Anonymization using Map Reduce on cloud“, IEEE Transaction on Parallel and Distributed Systems ,TPDSSI-2012.
  18. Zhang X, Yang LT, Liu C, Chen J. A scalable two?phase top?down specialization approach for data anonymization using MapReduce on cloud. IEEE Trans Parallel Distrib Syst. 2014;25(2):363–73
  19. Jefrey Dean and Sanjay Ghernawat “Map-Reduce: Simplified Data Processing on Large Clusters” Google,Inc.2004
  20. J. Dean and S. Ghemawat, "Mapreduce: Simplified Data Processing on Large Clusters," Comm. ACM, vol. 51, no. 1, pp. 107-113,2008
  21. Al?Zobbi M, Shahrestani S, Ruan C. Sensitivity?based anonymization of big data. In: Local computer networks work? shops (LCN workshops), 2016 IEEE 41st Conference on. IEEE; 2016. p. 58–64
  22. Al?Zobbi M, Shahrestani S, Ruan C. Implementing a framework for big data anonymity and analytics access control. In: Trustcom/BigDataSE/ICESS, 2017 IEEE. IEEE; 2017. p. 873–80.
  23. Al?Zobbi M, Shahrestani S, Ruan C. Multi?dimensional sensitivity?based anonymization method for big data. In: Elk? hodr M, Shahrestani S, Hassan Q, editors. Networks of the future: architectures, technologies, and implementations. Boca Raton: Chapman and Hall/CRC Computer and Information Science Series, Taylor & Francis; 2017. p. 448.

Publication Details

Published in : Volume 5 | Issue 1 | January-February 2019
Date of Publication : 2018-12-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 50-56
Manuscript Number : CSEIT19516
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Anushree Raj, Rio G L D'Souza, "Big Data Anonymization in Cloud using k-Anonymity Algorithm using Map Reduce Framework", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 1, pp.50-56, January-February-2019. Available at doi :
Journal URL :

Article Preview