Multilevel Feature Selection Method for Improving Classification of Microarray Gene Expression Data
DOI:
https://doi.org/10.32628/CSEIT2390131Keywords:
Microarray gene expression data, significant genes, feature selection, line segment approximation.Abstract
Microarray gene expression profiles provide valuable answers to a variety of problems, and contributes to advances in clinical medicine. Gene expression data typically has a high dimension and a small sample size. Gene selection from microarray gene expression data is a challenge due to high dimensionality of the data. The number of samples in the microarray dataset is much smaller compared to the number of genes as features. To extract useful gene information from cancer microarray data and reduce dimensionality, selection of significant genes is necessary. An effective method of gene feature selection helps in dimensionality reduction and improves the classification performance. Experimental results suggest that appropriate combination of filter gene selection methods is more effective than individual techniques for microarray data classification. In this paper, we propose a two-layered feature selection method. In the first layer, t-test statistical method is used to remove the features that have little correlation with the classification results. In the second layer, line segment approximation method is used to transform the feature subset into a less dimensional feature space. Four well known classifiers kNN, SVM, NBC, DT were used to verify the performance of the proposed feature selection algorithm on binary class microarray data. The experimental results show that the proposed method can effectively select relevant gene subsets, and achieves higher classification accuracy.
References
- Ahmed, O., and Brifcani, A. (2019, April). Gene Expression Classification Based on Deep Learning. 4th Scientific International Conference Najaf (SICN) pp. 145-149, 2019.
- Alomari, O.A., Khader, A.T., Al-Betar, M.A., Abualigah L.M. MRMR BA: a hybrid gene selection algorithm for cancer classification. J Theor Appl Inf Technol , 95 (12):2610–8, 2017.
- Ding, C., Peng, H. Minimum redundancy feature selection from microarray gene expression data. In:Journal Bioinformatics and Computer Biology, pp.523-529, 2003.
- I.P. Yang E. Almon, R.R. Analysis of time-series gene expression data: methods, challenges, and opportunities. Annu Rev Biomed Eng., 9:205–228, 2007.
- Cahyaningrum, K., and Astuti, W. Microarray Gene Expression Classification for Cancer Detection using Artificial Neural Networks and Genetic Algorithm Hybrid Intelligence. International Conference on Data Science and Its Applications (ICoDSA) (pp. 1-7). IEEE, 2020.
- Lai C. M., and Huang H. P. A gene selection algorithm using simplified swarm optimization with multi-filter ensemble technique. Applied Soft Computing, 106994, 2020.
- Maniruzzaman M, et al. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Comput Methods Prog Biomed;176:173–93, 2019.
- Diday. An introduction to symbolic data analysis and sodas software. Electro. J.Symb. Data Anal. 1-25, 2002.
- Hatim Z Almarzouki. Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile. Journal of Healthcare Engineering, Article ID 4715998, 13 pages, https://doi.org/10.1155/2022/4715998, 2022.
- T.Ragunthar, S.Selvakumar. Classification of Gene Expression Data with Optimized Feature Selection. International Journal of Recent Technology and Engineering (IJRTE). ISSN: 2277-3878, Volume-8 Issue-2, July2019.
- Inza I., Larrañaga P., Blanco R., Cerrolaza A.J. Filter versus wrapper gene selection approaches in DNA microarray domains, Artif Intell Med, 31(2):91-103, 2002.
- Liu Q, et al. Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genomics 12(Suppl 5):S1, 2011.
- Y., Inza I., Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2007.
- Christoph Bartenhagen, Hans-Ulrich Klein, Christian Ruckert, Xiaoyi Jiang and Martin Dugas. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinformatics, 11:567, 2010.
- Statnikov A., Aliferis C.F., Tsamardinos I., Hardin D., Levy, S. A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis. Bioinformatics 21(5), 631–643, 2005.
- Xing E., Jordan M., Karp R. Feature selection for high-dimensional genomic microarray data. Proceedings of the 18th International Conference on Machine Learning, 2001.
- Zhang X., He T., Ouyang L., Xu X., and Chen S. A Survey of Gene Selection and Classification Techniques Based on Cancer Microarray Data Analysis. IEEE 4th International Conference on Computer and Communications (ICCC) (pp. 1809-1813) IEEE, 2018.
- Dietterich TG2000 Dietterich TG. Ensemble methods in machine learning. In: Proceedings of Multiple Classifier System.vol. 1857.Springer; 2000. pp. 1–15.
- Saeys Y, Thomas Abeel, Yves Van de Peer. Robust feature selection using ensemble feature selection techniques. In Proceedings of the 25th European Conference on Machine Learning and Knowledge Discovery in Databases, Part II, Springer-Verlag, Berlin, Heidelberg, pp. 313–325 (2008).
- Y.H., Xiao Y., Segal M.R. :Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics. 21(7):1084–1093 (2005)
- Yang et al., “ A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data”, BMC Bioinformatics, 11(Suppl 1):S5 doi: 10.1186/1471-2105-11-S1-S5, 2010.
- JW (1977) Exploratory data analysis. Addison-wesley series in behavioral science, First Edition.
- Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the 13th international conference , pp325-332
- 1998 Kittler, J., Hatef, M. Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 226-239.
- 1991 Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991) Adaptive mixtures of local experts. Neural Computation, 3, 79-87.
- D. H. (1992) Stacked generalization. Neural Networks, 5, 241-259.
- P 1992 P. Pudil, J. Novovicova, S.Blaha and J. Kittler. Multistage Pattern Recognition with Rejection Option. Proceedings of the 11th International Conference on Pattern Recognition, Vol.B, pp. 92 - 95, 1992.
- 2000 C. Kaynak and E. Alpaydin. MultiStage Cascading of Multiple Classifiers: One Man's Noise is Another Man's Data. Proc. 17th International Conf. on Machine Learning, 2000.
- G., Pillai, I., & Roli, F. (2004). A Two-Stage Classifier with Reject Option for Text Categorisation. In
- Structural, Syntactic, and Statistical Pattern Recognition (pp. 771–779). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27868-9_84.
- Sun 2004] Zhenan Sun, Yunhong Wang, Tieniu Tan and Jiali Cui. Cascading Statistical And Structural Classifiers For Iris Recognition. Proceedings of IEEE International Conference on Image Processing, 2004, pp.1261 - 1264.
- Qi, Zhongchao shi, Xuying Zhao and Yangsheng Wang. Cascading a Couple of Registration Methods for a High Accurate Fingerprint Verification System. Proceedings of Sinobiometrics’04, LNCS 3338, Beijing, China, Dec. 2004
- and Dr.Lalitha Rangarajan. An Approach to reduce the large feature space of Microarray Gene Expression data by Gene Clustering for efficient sample classification. International Journal of Computer Applications, Issue 8, Volume 2, March-April 2018. (UGC No: 64190, ISSN : 2250 1797)
- Dash, Rasmita, Misra, Bijan Biahri , 2016. Pipelining the ranking techniques for microarray data classification: a case study. Appl.soft Comput, 48, 298-316.
- Rajani Bala, Ramesh Kumar Agrawal. Clustering in Conjunction With Wrapper Approach to Select Discriminatory Genes For Microarray Dataset Classification. Computing and Informatics, 2012,Vol. 31, 921–938.
- Nguyen T, Khosravi A,Creighton D, Nahavandi S. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification. PLoS ONE, 2015, 10(3):e0120364.
- J H, Bentley J L, Finkel R A. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans.Math.Softw., 1977, 3(3):209–226.
- Cortes C, Vapnik V. Support-Vector Networks. Mach Learning, 1995, 20(3):273–297.
- Quinlan J R. Simplifying decision trees. International Journal of Human-Computer Studies,1999, 51 (2):497.
- G H, Langley P. Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.1995.
- A multi-task machine learning software. http://www.cs.waikato.ac.nz/ml/weka.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.