Feature Selection for Microarray Gene Expression Data Using Simulated Annealing Guided by the Multivariate Joint Entropy

Felix F. Gonzalez-Navarro, Lluís A. Belanche-Muñoz


Microarray classification poses many challenges for data analysis, given that a gene expression data set may consist of dozens of observations with thousands or even tens of thousands of genes. In this context, feature subset selection techniques can be very useful to reduce the representation space to one that is manageable by classification techniques. In this work we use the discretized multivariate joint entropy as the basis for a fast evaluation of gene relevance in a Microarray Gene Expression context. The proposed algorithm combines a simulated annealing schedule specially designed for feature subset selection with the incrementally computed joint entropy, reusing previous values to compute current feature subset relevance. This combination turns out to be a powerful tool when applied to the maximization of gene subset relevance. Our method delivers highly interpretable solutions that are more accurate than competing methods. The algorithm is fast, effective and has no critical parameters. The experimental results in several public-domain microarray data sets show a notoriously high classification performance and low size subsets, formed mostly by biologically meaningful genes. The technique is general and could be used in other similar scenarios.


Feature Selection; Microarray Gene Expression Data; Multivariate Joint Entropy; Simulated AnnealingFeature selection; microarray gene expression data; multivariate joint entropy; simulated annealing

Full Text: PDF