Gene Ranking Techniques via Attribute Evaluation Algorithms for DNA Microarray Analysis

Document Type : Original Article

Authors

1 Prof., Sadat Academy for Management Science (SAMS), Department of Computer science.

2 Dr., Modern University for Technology and Information (M.T.I), Department of Computer science.

3 Egyptian Armed Forces.

Abstract

Due to the huge numbers of genes that produced from microarray technology versus genes that actually discriminate disease classes, gene selection methods for microarray data analysis are vital to identify the significant genes that distinguish disease classes and to use these selected genes as diagnostic biomarkers in clinical treatment decisions. In this study, we describe how to achieve reduction of microarray data dimensionality by two attribute selection methods (AS), namely information gain method (IG) and support vector machine method (SVM) which can greatly reduce the number of attributes used to discriminate microarray data. We employ both methods, to pre-process gene expression profiles achieved from DNA microarray experiments in three steps: (i) Ranking genes according to the highest dataset separation between diseased and normal classes, (ii) Choosing the smallest subset of ranked genes that assures the highest classification accuracy, (iii) Constructing the classification models to classify diseased versus normal samples using multiple algorithms based on the extracted subset in (ii). Evaluation of this approach was conducted by using ten
different classification algorithms, with eight variant cancerous microarray dataset. Based on the obtained results, this pre-processing approach improved classification accuracy compared to using the whole original dataset. All the evaluated algorithms which used in our approach provided classification accuracy exceeds over (94%) with majority of datasets. By using a few numbers of top ranking genes, we obtained higher classification accuracy instead of using original dataset, the average values of enhancement were (1.31%, 3.01%, 4.06%, 3.54% and 3.59%) using (2, 5, 10, 20, 50) subset of ranking genes by information gain attribute selection respectively, and (0.19%, 4.33%, 5.05%, 5.54% and 5.63%) using (2, 5, 10, 20, 50) subset of ranking genes by SVM attribute selection. Experimental results shows that using SVM
attributes selections method yields better results than using information gain attribute selection method as preprocessing stage of the classification task. Also, it can be shown that Artificial Neural Network (ANN) outperforms all classifiers when SVM attribute selection method used while Bayes Net outperforms all classifiers when information gain attribute selection is applied.

Keywords