Identification of the Best Filter-based Feature Selection Techniques for Microarray Datasets

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Identification of the Best Filter-based Feature Selection Techniques for Microarray Datasets

Authors

Islam, A.; Bhadra, T.; Mali, K.; Giri, J.; Aurangzeb, K.; Budhathoki, R. K.; Mallik, S.

Abstract

This article basically explores the impact of univariate & multivariate filter-based feature selection methodologies on enhancing the classification performance for the real-life classification problems. Our study considers two univariate filter-based feature selection techniques, namely, Chi-square and Fisher score, as well as two multivariate filter-based feature selection techniques, viz., Symmetrical Uncertainty and Minimum Redundancy-Maximum Relevance(mRMR). These methods are applied to feature selection from Five diverse collections of datasets, including datasets related to Mixed-lineage Leukaemia (MLL), Lung Cancer, Ovarian Cancer, Central Nervous System (CNS), and Colon Cancer. For each feature, fitness values are calculated using the four aforementioned feature selection methods. After that, a stratified 10-fold cross-validation procedure is conducted using Support Vector Machines (SVM) and Multilayer Perceptrons (MLP) to determine the classification accuracy for each feature. A set of five microarray datasets was used in this evaluation in order to assess the effectiveness of the filter methods. The results of this study represent the first comprehensive analysis and comparison of gene expression datasets filtered using a variety of ranking strategies. Among these approaches, entropy-based methods (e.g., mRMR) emerge as the most effective. The mRMR method demonstrates Outstanding performance outcomes of accuracy, F1-score, and Root Mean Square Error (RMSE). When comparing classifier performance, the F1-score, which combines precision and recall, is particularly useful, while the RMSE measures prediction accuracy. Chi-square, Fisher Score, and Symmetrical Uncertainty (SU) follow as the second, third, and fourth best approaches, respectively. Although the SVM classifier demonstrates superior performance, the difference in accuracy between SVM and the MLP classifier is marginal.

Follow Us on

0 comments

Add comment