Frequent Data Partitioning using Parallel Mining Item Sets in MapReduce

Chenna Venkata Suneel; Dr. K. Prasanna; Dr. M. Rudra Kumar

doi:10.32628/CSEIT1724152

Authors

Chenna Venkata Suneel M.Tech.,(PG Scholar), Dept of CSE,Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India
Dr. K. Prasanna Assocaite Professor, Dept of CSE,Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India
Dr. M. Rudra Kumar Professor, Dept of CSE, Annamacharya Institute of Technology & Sciences, Rajampet, Kadapa, Andhra Pradesh, India

Keywords:

Data Mining, Recommender Systems, Social Network

Abstract

For mining frequent Itemsets parallel traditional algorithms are used. Existing parallel Frequent Itemsets mining algorithm partition the data equally among the nodes. These parallel Frequent Itemsets mining algorithms have high communication and mining overheads. We resolve this problem by using data partitioning strategy. It is based on Hadoop. The core of Apache Hadoop consists of a storage part, called as Hadoop Distributed File System (HDFS), and a processing part called Map Reduce. Hadoop divides files into large blocks. It distributes them across nodes in a cluster. By using this strategy the performance of existing parallel frequent-pattern increases. This paper shows the various parallel mining algorithms for frequent itemsets mining. We summarize the various algorithms that were developed for the frequent itemsets mining, like candidate key generation algorithm, such as Apriori algorithm and without candidate key generation algorithm, such as FP-growth algorithm. These algorithms lacks mechanisms like load balancing, data distribution I/O overhead, and fault tolerance. The most efficient the recent method is the FiDoop using ultrametric tree (FIUT) and Mapreduce programming model. FIUT scans the database only twice. FIUT has four advantages. First: I reduces the I/O overhead as it scans the database only twice. Second: only frequent itemsets in each transaction are inserted as nodes for compressed storage. Third: FIU is improved way to partition database, which significantly reduces the search space. Fourth: frequent itemsets are generated by checking only leaves of tree rather than traversing entire tree, which reduces the computing time.

References

"Parallel Mining of Association rule." Rakesh Agarwal ,John C Safer
"Frequent Itemset Mining for Big Data Sandy Moens, Emin Aksehirli and Bart Goethals Universiteit Antwerpen, Belgium
"ECLAT Algorithm for Frequent Itemsets Generation "Manjit kaur , Urvashi Grag Computer Science and Technology, Lovely Professional University Phagwara, Punjab, India . International Journal of Computer Systems (ISSN: 2394-1065), Volume 01– Issue 03, December, 2014 Available at http://www.ijcsonline.com/
"Implementation Of Parallel Apriori Algorithm On Hadoop Cluster" A. Ezhilvathani1, Dr. K. Raja. International Journal of Computer Science and Mobile Computing
"Frequent Itemsets Parallel Mining Algorithms " Suraj Ghadge, Pravin Durge, Vishal Bhosale,Sumit Mishra. Department of Computer Engineering, JSPM’s ICOER. International Engineering Research Journal (IERJ) Volume 1 Issue 8 Page 599-604, 2015, ISSN 2395-1621
"FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce" Yaling Xun, Jifu Zhang, and Xiao Qin, Senior Member, IEEE
Yaling Xun, Jifu Zhang, Xiao Qin,FiDoop-Dp Data Partitioning in Frequent Itemset Mining on Hadoop clusters,2016.
S. Sakr, A. Liu, and A. G. Fayoumi, âœThe family of mapreduce and large-scale data processing systems, ACM Computing Surveys (CSUR), vol. 46, no. 1, p. 11, 2013.
X. Lin, Mr-apriori: Association rules algorithm based on mapreduce,â in Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on. IEEE, 2014, pp. 141"144.
S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, âœThe study of improved fp-growth algorithm in mapreduce, in 1st International Workshop on Cloud Computing and Information Security. Atlantis Press, 2013. 11P. Uthayopas and N. Benjamas, Impact of i/o and execution scheduling strategies on large scale parallel data mining, Journal of Next Generation Information Technology (JNIT), vol. 5, no. 1, p. 78, 2014.
Y. Xun, J. Zhang, and X. Qin, Fidoop: Parallel mining of frequent itemsets using mapreduce, IEEE Transactions on Systems, Man, and Cybernetics: Systems, doi: 10.1109/TSMC.2015.2437327, 2015.
W. Lu, Y. Shen, S. Chen, and B. C. Ooi, Efficient processing of k nearest neighbor joins using mapreduce,â Proceedings of the VLDB Endowment, vol. 5, no. 10, pp. 1016â"1027, 2012.
J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge University Press, 2014. 9B. Bahmani, A. Goel, and R. Shinde, Efficient distributed locality sensitive hashing,â in Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012, pp.2174â"2178.
M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, âœData partitioning for minimizing transferred data in mapre- duce,â in Data Management in Cloud, Grid and P2P Systems. Springer,2013, pp. 1"12

Frequent Data Partitioning using Parallel Mining Item Sets in MapReduce

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite