An Efficient Missing Data Imputation Based On Co-Cluster Sparse Matrix Learning

Authors

  • F. Femila  Department of Computer Science, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
  • G. Sridevi  Department of Computer Science, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
  • D. Swathi  Department of Computer Science, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India
  • K. Swetha  Department of Computer Science, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India

DOI:

https://doi.org//10.32628/CSEIT195220

Keywords:

Data Preprocessing, Missing Value, Co-Cluster Sparse Matrix, Sparse Recovery

Abstract

Missing data padding is an important problem that is faced in real time. This makes the task of data processing challenging. This paper aims to design a solution for this problem which is ways different from traditional approaches. The proposed method is based on co-cluster sparse matrix learning (CCSML) method. This algorithm learns without reference class, and even with data continuous missing rate as high as the existing techniques. This method is based on a tensor optimization model and labeled maximum block. The computational models of sparse recovery learning are based on low-rank matrix and co-clusters of genome-wide association study (GWAS) data matrices, and the performance is better than existing techniques.

References

  1. R. K. Pearson, "The problem of disguised missing data," ACM SIGKDD Explorations News. Lett., vol. 8, no. 1, pp. 83-92, 2006.
  2. R. J. A. Little and D. B. Rubin, Statistical Analysis With Missing Data, 2nd ed. Hoboken, NJ, USA: Wiley, 2002, pp. 200-220.
  3. F. Z. Poleto, J. M. Singer, and C. D. Paulino, "Missing data mechanisms and their implications on the analysis of categorical data," Stat. Comput., vol. 21, no. 1, pp. 3143, Jan. 2011.
  4. X. Zhu, S. Zhang, Z. Jin, Z. Zhang, and Z. Xu, "Missing value estimation for mixed-attribute data sets," IEEE Trans. Knowl. Data Eng., vol. 23, no. 1, pp. 110-121, Jan. 2011.
  5. Y. Qin, S. Zhang, X. Zhu, J. Zhang, and C. Zhang, "Semi-parametric optimization for missing data imputation," Appl. Intell., vol. 27, no. 1, pp. 79-88, 2007.
  6. U. Dick, P. Haider, and T. Scheffer, "Learning from incomplete data with innate imputations," in Proc. 25th Int. Conf. Mach. Learn., Jul. 2008, pp. 232-239.
  7. Z. Shan, D. Zhao, and Y. Xia, "Urban road traffic speed estimation for missing probe vehicle data based on multiple linear regression model," in Proc. 16th Int. IEEE Conf. Intel. Transp. Syst. (ITSC), The Hague, The Netherlands, Oct. 2013, pp. 118-123.
  8. F. Bashir and H.-L. Wei, "Parametric and non-parametric methods to enhance prediction performance in the presence of missing data," in Proc. 19th Int. Conf. Syst. Theory, Control Compute. (ICSTCC), Cheile Gradistei, Romania, 2015, pp. 337-342.
  9. A. Karmaker and S. Kwek, "Incorporating an EM-approach for handling missing attribute-values in decision tree induction," in Proc. 5th Int. Conf. Hybrid Intell. Syst. (HIS), 2005, p. 6.
  10. D.-H. Yang, N.-N. Li, H.-Z.Wang, J.-Z. Zhao, and H. Gao, "The optimization of the big data cleaning based on task merging," Chin. J. Comput., vol. 39, no. 1, pp. 97-108, 2016.
  11. M. Zhu and X. B. Cheng, "Iterative KNN imputation based on GRA for missing values in TPLMS," in Proc. 4th Int. Conf. Comput. Sci. Netw. Technol. (ICCSNT), Harbin, China, 2015, pp. 94-99.
  12. P. Keerin, W. Kurutach, and T. Boongoen, "Cluster-based KNN missing value imputation for DNA microarray data," in Proc. IEEE Int. Conf. Syst., Man, (SMC), Seoul, South Korea, Oct. 2012, pp. 445-450.
  13. L. Jin, H. Wang, S. Huang, and H. Gao, ``Missing value imputation in big data based-on map-reduce,'' J. Comput. Res. Develop., vol. 50, no. S1, pp. 312-321, 2013.
  14. M. G. Rahman and M. Z. Islam, "iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm," in Proc. 16th Int. Conf. Comput. Inf. Technol. (ICCIT), Khulna, Bangladesh, 2014, pp. 496-501.
  15. S. Wu, X.-D. Feng and Z.-G. Shan, "Missing data imputation approach based on incomplete data clustering," Chin. J. Comput., vol. 35, no. 28, pp. 1726-1738, Aug. 2012.
  16. Md. Geaur Rahman and Md Zahidul Islam”iDMI: A Novel Technique for Missing Value Imputation using a Decision Tree and Expectation-Maximization Algorithm “,16th Int'l Conf. Computer and Information Technology,Khulna, Bangladesh, 8-10 March 2014.
  17. M. G. Rahman and M. Z. Islam, “kdmi: A novel method for missing values imputation using two levels of horizontal partitioning in a data set,” in The 9th International Conference on Advanced Data Mining and Applications (ADMA 13). in press, Hangzhou, China: Springer, 2013.
  18. K. Cheng, N. Law, and W. Siu, “Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data,” Pattern Recognition, vol. 45, no. 4, pp. 1281-1289, 2012.
  19. Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum Genet. 124, 439-450, doi: 10.1007/s00439-008-0568-7 (2008).
  20. Xiaolong xu, weizhi chong, shancang li, abdullahi arabo and jianyu xiao. "MIAEC: Missing Data Imputation Based on the Evidence Chain",

Downloads

Published

2019-04-30

Issue

Section

Research Articles

How to Cite

[1]
F. Femila, G. Sridevi, D. Swathi, K. Swetha, " An Efficient Missing Data Imputation Based On Co-Cluster Sparse Matrix Learning, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 5, Issue 2, pp.215-222, March-April-2019. Available at doi : https://doi.org/10.32628/CSEIT195220