Challenges in Handling Imbalanced Big Data: A Survey

B. S. Mounika Yadav; Sesha Bhargavi Velagaleti

doi:10.32628/CSEIT411829

Authors

B. S. Mounika Yadav Assistant Professor, IT Dept.,Vasavi College of Engineering, Hyderabad, Telangana, India
Sesha Bhargavi Velagaleti Assistant Professor, IT Dept., G. Narayanamma Institute of Technology and Science, Hyderabad, Telangana, India

Keywords:

Big Data, TNrate, SMOTE, MapReduce

Abstract

Big Data describes enormous sets that have more divergent and intricate structure like weblogs, social media, email, sensors, and photographs. These unstructured data and peculiar characteristics from traditional databases typically associated with extra complications in storing, analyzing and applying further procedures or extracting results. Big Data analytics is the process of auditing gigantic amounts of complex data to find out unseen patterns or recognizing hidden correlations. Big Data applications are rising during the last years, and researchers from many disciplines are aware of the advantages related to the knowledge extraction from this type of problem. However traditional learning approaches cannot be enforced due to the scalability issues. Being still a recent discipline, handful research has been conducted on imbalanced data classification for Big Data. The apprehension behind this is mainly the difficulties in adapting standard techniques to the Map-Reduce programming style. Additionally, inner problems of imbalanced data, namely lack of data for training, the overlap between classes, the presence of noise and small disjuncts, are emphasized during the data partitioning to fit the Map-Reduce programming style. A literature survey on classification problem in Big Data has been done and existing methodologies were discussed with their pros and cons in this paper. This study suggests that there is a great need for finding a new method of classification when it comes to Big Data which addresses several issues like multi-class problems, class imbalance etc.,

References

Río S, López V, Benítez J, Herrera F (2014) On the use of MapReduce for imbalanced Big Data using random forest. Inf Sci 285:112–137
Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action, 1st edn. Manning Publications Co., Greenwich
Lyubimov D, Palumbo A (2016) ApacheMahout: beyond MapReduce, 1st edn. CreateSpace Independent, North Charleston
Triguero I, Río S, López V, Bacardit J, Benítez JM, Herrera F (2015) ROSEFW-RF: the winner algorithm for the CBDL’14 Big Data competition: an extremely imbalanced Big Data bioinformatics problem. Knowl Based Syst 87:69–79
T.riguero I, GalarM, Vluymans S, Cornelis C, Bustince H, Herrera F, Saeys Y (2015) Evolutionary under sampling for imbalanced Big Data classification. In: IEEE congress on evolutionary computation (CEC), pp 715–722.
Hu F, Li H, Lou H, Dai J (2014) A parallel oversampling algorithm based on NRSBoundary-SMOTE. J Inf Comput Sci 11(13):4655–
Zhai J, Zhang S, Wang C (2015) the classification of imbalanced large data sets based on MapReduce and ensemble of elm classifiers. Int J Mach Learn Cybern. doi:10.1007/s13042-015-0478-
Wang X, Liu X, Matwin S (2014) A distributed instance-weighted SVM algorithm on large-scale imbalanced datasets. In: Proceedings of the 2014 IEEE international conference on Big Data, 2014, pp 45–51.
Río S, López V, Benítez J, Herrera F (2014) On the use of MapReduce for imbalanced Big Data using random forest. Inf Sci 285:112–137
López V, Río S, Benítez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced Big Data. Fuzzy Sets Syst 258:5-38.
Río S, López V, Benítez JM, Herrera F (2015) A MapReduce approach to address Big Data classification problems based on the fusion of linguistic fuzzy rules. Int J Comput Intell Syst 8(3):422–437

Challenges in Handling Imbalanced Big Data: A Survey

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite