Privacy Preserving High Order Expectation Maximization Algorithm for Big Data Clustering with Redundancy Removal

Authors

  • R. Sureka  ME Student, Department of CSE, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamil Nadu, India
  • P. Shanmugapriya  Assistant Professor, Department of CSE, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamil Nadu, India

Keywords:

Heterogeneous database, Big data, Cloud computing, Privacy Preserving, Clustering

Abstract

Cloud computing has become increasingly prevalent, providing end-users with temporary access to scalable computational resources. At a conceptual level, cloud computing should be a good fit for technical computing users. A heterogeneous cloud, on the other hand, integrates components by many different vendors, either at different levels (a management tool from one vendor driving a hypervisor from another) or even at the same level (multiple different hypervisors, all driven by the same management tool).Nowadays, a large number of heterogeneous data, often referring to big data, is generating from big storage, which requires novel models and technologies to process, especially clustering based computing, for the further promotion the design and applications of big data analytics. However, the heterogeneous data is usually very complex, which is composed of structured data and unstructured data, such as picture, text, pdf and video. In other words, the heterogeneous data contain multimodal between which there are nonlinear relationships. In the existing work, proposed a high-order possibilistic c-means algorithm by extending the conventional possibilistic c-means algorithm from the vector space to the tensor space for multimedia heterogeneous data clustering. Furthermore, employed cloud computing to improve the clustering efficiency for massive heterogeneous data. To protect the private data during clustering on cloud, proposed a privacy-preserving expectation maximization algorithm by using the asymmetric encryption scheme to encrypt the original data. The existing BGV scheme does not support the division operations and exponential operations that are used in the membership matrix updating function of the high-order fuzzy c-means algorithm. To address this problem, use the asymmetric encryption scheme to approximate the membership matrix updating function to a polynomial function. Cloud computing has become increasingly prevalent, providing end-users with temporary access to scalable computational resources. At a conceptual level, cloud computing should be a good fit for technical computing users. A heterogeneous cloud, on the other hand, integrates components by many different vendors, either at different levels (a management tool from one vendor driving a hypervisor from another) or even at the same level (multiple different hypervisors, all driven by the same management tool).Nowadays, a large number of heterogeneous data, often referring to big data, is generating from big storage, which requires novel models and technologies to process, especially clustering based computing, for the further promotion the design and applications of big data analytics. However, the heterogeneous data is usually very complex, which is composed of structured data and unstructured data, such as picture, text, pdf and video. In other words, the heterogeneous data contain multimodal between which there are nonlinear relationships. In the existing work, proposed a high-order possibilistic c-means algorithm by extending the conventional possibilistic c-means algorithm from the vector space to the tensor space for multimedia heterogeneous data clustering. Furthermore, employed cloud computing to improve the clustering efficiency for massive heterogeneous data. To protect the private data during clustering on cloud, proposed a privacy-preserving expectation maximization algorithm by using the asymmetric encryption scheme to encrypt the original data. The existing BGV scheme does not support the division operations and exponential operations that are used in the membership matrix updating function of the high-order fuzzy c-means algorithm. To address this problem, use the asymmetric encryption scheme to approximate the membership matrix updating function to a polynomial function.

References

  1. Chen. Y, L. Wang, and M. Dong, “Non-Negative Matrix Factorization for Semi supervised Heterogeneous Data Coclustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1459-1474, Oct. 2010.
  2. Jiang .T and A.-H. Tan, “Learning Image-Text Associations,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 2, pp. 161-177, Feb. 2009.
  3. Long . B, X.Wu, Z. Zhang, and P. Yu, “Spectral Clustering for Multi-Type Relational Data,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 585-592.
  4. Meng . L, A. Tan, and D. Xu, ”Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 9, pp. 2293-2306, Aug. 2014.
  5. Zhang, Qingchen, et al. "PPHOPCM: Privacy-preserving High-order Possibilistic c-Means Algorithm for Big Data Clustering with Cloud Computing." IEEE Transactions on Big Data (2017).
  6. Zhang. Q, L. T. Yang, and Z. Chen, “Deep Computation Model for Unsupervised Feature Learning on Big Data,” IEEE Transactions onServices Computing, vol. 9, no. 1, pp. 161-171, Jan. 2016.
  7. Zhang. Q, C. Zhu, L. T. Yang, Z. Chen, L. Zhao, and P. Li, “An Incremental CFS Algorithm for Clustering Large Data in Industrial Internet of Things,” IEEE Transactions on Industrial Informatics, 2015.
  8. Zhang Q, L. T. Yang, and Z. Chen, “Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning,” IEEE Transactions on Computers, vol. 65, no. 5, pp. 1351-1362, May 2016.
  9. Zhao. R and W. Grosky, “Narrowing the Semantic Gap Improved Text- Based Web Document Retrieval Using Visual Features,” IEEE Transactions on Multimedia, vol. 4, no. 2, pp. 189-200, Jun. 2002.
  10. Zhang . Q, L. T. Yang, Z. Chen, and Feng Xia, “A High-Order Possibilistic-Means Algorithm for Clustering Incomplete Multimedia Data,” IEEE Systems Journal, 2015

Downloads

Published

2018-06-30

Issue

Section

Research Articles

How to Cite

[1]
R. Sureka, P. Shanmugapriya, " Privacy Preserving High Order Expectation Maximization Algorithm for Big Data Clustering with Redundancy Removal, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 5, pp.293-300, May-June-2018.