Frequent Data Partitioning using Parallel Mining Item Sets in MapReduce

Authors(3) :-Chenna Venkata Suneel, Dr. K. Prasanna, Dr. M. Rudra Kumar

For mining frequent Itemsets parallel traditional algorithms are used. Existing parallel Frequent Itemsets mining algorithm partition the data equally among the nodes. These parallel Frequent Itemsets mining algorithms have high communication and mining overheads. We resolve this problem by using data partitioning strategy. It is based on Hadoop. The core of Apache Hadoop consists of a storage part, called as Hadoop Distributed File System (HDFS), and a processing part called Map Reduce. Hadoop divides files into large blocks. It distributes them across nodes in a cluster. By using this strategy the performance of existing parallel frequent-pattern increases. This paper shows the various parallel mining algorithms for frequent itemsets mining. We summarize the various algorithms that were developed for the frequent itemsets mining, like candidate key generation algorithm, such as Apriori algorithm and without candidate key generation algorithm, such as FP-growth algorithm. These algorithms lacks mechanisms like load balancing, data distribution I/O overhead, and fault tolerance. The most efficient the recent method is the FiDoop using ultrametric tree (FIUT) and Mapreduce programming model. FIUT scans the database only twice. FIUT has four advantages. First: I reduces the I/O overhead as it scans the database only twice. Second: only frequent itemsets in each transaction are inserted as nodes for compressed storage. Third: FIU is improved way to partition database, which significantly reduces the search space. Fourth: frequent itemsets are generated by checking only leaves of tree rather than traversing entire tree, which reduces the computing time.

Authors and Affiliations

Chenna Venkata Suneel
M.Tech.,(PG Scholar), Dept of CSE,Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India
Dr. K. Prasanna
Assocaite Professor, Dept of CSE,Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India
Dr. M. Rudra Kumar
Professor, Dept of CSE, Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India

Data Mining, Recommender Systems, Social Network

  1. "Parallel Mining of Association rule." Rakesh Agarwal ,John C Safer
  2. "Frequent Itemset Mining for Big Data Sandy Moens, Emin Aksehirli and Bart Goethals Universiteit Antwerpen, Belgium
  3. "ECLAT Algorithm for Frequent Itemsets Generation "Manjit kaur , Urvashi Grag Computer Science and Technology, Lovely Professional University Phagwara, Punjab, India . International Journal of Computer Systems (ISSN: 2394-1065), Volume 01Ė Issue 03, December, 2014 Available at http://www.ijcsonline.com/
  4. "Implementation Of Parallel Apriori Algorithm On Hadoop Cluster" A. Ezhilvathani1, Dr. K. Raja. International Journal of Computer Science and Mobile Computing
  5. "Frequent Itemsets Parallel Mining Algorithms " Suraj Ghadge, Pravin Durge, Vishal Bhosale,Sumit Mishra. Department of Computer Engineering, JSPM’s ICOER. International Engineering Research Journal (IERJ) Volume 1 Issue 8 Page 599-604, 2015, ISSN 2395-1621
  6. "FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce" Yaling Xun, Jifu Zhang, and Xiao Qin, Senior Member, IEEE
  7. Yaling Xun, Jifu Zhang, Xiao Qin,FiDoop-Dp Data Partitioning in Frequent Itemset Mining on Hadoop clusters,2016.
  8. S. Sakr, A. Liu, and A. G. Fayoumi, ‚úThe family of mapreduce and large-scale data processing systems, ACM Computing Surveys (CSUR), vol. 46, no. 1, p. 11, 2013.
  9. X. Lin, Mr-apriori: Association rules algorithm based on mapreduce,‚ in Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on. IEEE, 2014, pp. 141"144.
  10. S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, ‚úThe study of improved fp-growth algorithm in mapreduce, in 1st International Workshop on Cloud Computing and Information Security. Atlantis Press, 2013. 11P. Uthayopas and N. Benjamas, Impact of i/o and execution scheduling strategies on large scale parallel data mining, Journal of Next Generation Information Technology (JNIT), vol. 5, no. 1, p. 78, 2014.
  11. Y. Xun, J. Zhang, and X. Qin, Fidoop: Parallel mining of frequent itemsets using mapreduce, IEEE Transactions on Systems, Man, and Cybernetics: Systems, doi: 10.1109/TSMC.2015.2437327, 2015.
  12. W. Lu, Y. Shen, S. Chen, and B. C. Ooi, Efficient processing of k nearest neighbor joins using mapreduce,‚ Proceedings of the VLDB Endowment, vol. 5, no. 10, pp. 1016‚"1027, 2012.
  13. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge University Press, 2014. 9B. Bahmani, A. Goel, and R. Shinde, Efficient distributed locality sensitive hashing,‚ in Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012, pp.2174‚"2178.
  14. M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, ‚úData partitioning for minimizing transferred data in mapre- duce,‚ in Data Management in Cloud, Grid and P2P Systems. Springer,2013, pp. 1"12

Publication Details

Published in : Volume 2 | Issue 4 | July-August 2017
Date of Publication : 2017-08-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 641-644
Manuscript Number : CSEIT1724152
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Chenna Venkata Suneel, Dr. K. Prasanna, Dr. M. Rudra Kumar, "Frequent Data Partitioning using Parallel Mining Item Sets in MapReduce ", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 4, pp.641-644, July-August-2017.
Journal URL : http://ijsrcseit.com/CSEIT1724152

Article Preview

Follow Us

Contact Us