A Cost-Sensitive Convolution Neural Network for Cancer Subgroups Classification

Document Type : Original Article

Abstract
Classification of cancer subtypes is very important task for the diagnosis and
prognosis of cancer. In recent years, deep learning methods have gained
considerable popularity for this reason; however, it is difficult to determine the
structure of the neural network because the function of the deep network
depends largely on its structure. In addition, the high number of genes in the
gene expression database and the imbalanced data between different classes
have a direct effect on the complexity and performance of cancer subgroup
classification models. To address the problem of unbalanced data, a convolution
neural network (CNN) model using a cost-sensitive strategy is proposed to
increase the model's accuracy in identifying minority classes. On the other hand,
the fisher ratio technique is used to reduce genes in the preprocessing stage. In
techniques the cost-sensitive method, a cost matrix is created based on the
distribution of classes, and then this matrix is used in the CNN network cost
function step to calculate the amount of error. Two sets of cancer datasets are
used to evaluate the proposed method. The results show that selecting the
appropriate genes for classification along with the use of cost-sensitive learning
can increase the performance of the proposed method compared to the CNN
model without selecting the feature and cost-sensitive learning about 11%, 10%
and 18% in terms of three criteria of accuracy, recall and precision, respectively.
Keywords: Classification, Imbalanced Data, Cancer subgroups, Gene
expression,Convolution Neural Networks, Cost-sensitive learning