Data Partitioning in Frequent Itemset on Bigdata Using Hadoop

Authors(2) :-A. Sindhuja, M. Sridevi

Generally FIM is one of primary concerns in data mining. Whereas the problems of FIM have been studied, that standard and better solutions scale. This is generally the case when i) the sum of data tend to be extremely large and/or ii) A MinSup threshold is very low. In this paper, I propose a highly measurable and parallel frequent item set mining (PFIM) algorithm that is Parallel Absolute Top Down. PATD algorithm renders the mining process of very large amount of databases (Terabytes of data) easy and compact. Its mining process is completed for just parallel jobs, which dramatically reduce the mining runtime, communication cost and energy power utilization overhead, in a disseminated computational platform. Based on an intellectual and efficient data partitioning approach describe IBDP, PATD algorithm mines every data partition separately, relying on entire minimum support (A MinSup) as of a Relative one. PATD contain extensively evaluated using real-world data sets. My experimental results advise that PATD algorithm is considerably more capable as well as scalable than alternative approaches.

Authors and Affiliations

A. Sindhuja
Department of CNIS, G Narayanamma Institute of Technology and Science, Hyderabad, Telangana, India
M. Sridevi
Assistant Professor, Department of CNIS,G Narayanamma Institute of Technology and Science, Hyderabad, Telangana, India

Big Data, Data Mining , Frequent Itemset , Machine Learning, MapReduce

  1. Yaling Xun, Jifu Zhang, Xiao Qin, FiDoop-Dp Data Partitioning in Frequent Itemset Mining on Hadoop clusters, 2016.
  2. I. Pramudiono and M. Kitsuregawa, "Fp-tax: Tree structure based generalized association rule mining," in Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. ACM, 2004, pp. 60-63.
  3. X. Lin, Mr-apriori: Association rules algorithm based on mapreduce, a in Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on. IEEE, 2014, pp. 141"144.
  4. S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, aoeThe study of improved fp-growth algorithm in mapreduce, in 1st International Workshop on Cloud Computing and Information Security. Atlantis Press, 2013.
  5. M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, aoeData partitioning for minimizing transferred data in mapreduce,a in Data Management in Cloud, Grid and P2P Systems. Springer, 2013, pp. 1a"12.
  6. Y. Xun, J. Zhang, and X. Qin, Fidoop: Parallel mining of frequent itemsets using mapreduce, IEEE Transactions on Systems, Man, and Cybernetics: Systems, doi: 10.1109/TSMC.2015.2437327, 2015.
  7. W. Lu, Y. Shen, S. Chen, and B. C. Ooi, Efficient processing of k nearest neighbor joins using mapreduce,a Proceedings of the VLDB Endowment, vol. 5, no. 10, pp. 1016a"1027, 2012.
  8. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge University Press, 2014.
  9. B. Bahmani, A. Goel, and R. Shinde, Efficient distributed locality sensitive hashing,a in Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012, pp.2174a"2178.
  10. P. Uthayopas and N. Benjamas, Impact of i/o and execution scheduling strategies on large scale parallel data mining, Journal of Next Generation Information Technology (JNIT), vol. 5, no. 1, p. 78, 2014.

Publication Details

Published in : Volume 2 | Issue 6 | November-December 2017
Date of Publication : 2017-12-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 1062-1067
Manuscript Number : CSEIT1726256
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

A. Sindhuja, M. Sridevi, "Data Partitioning in Frequent Itemset on Bigdata Using Hadoop", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 6, pp.1062-1067, November-December-2017. |          | BibTeX | RIS | CSV

Article Preview