Machine-Learning-for-Breast-Cancer-Classification-With-ANN-and-Decision-Tree

Breast cancer is one of the commonest cause of cancer deaths in women. It starts developing when threatening bumps start forming from the breast cells, and unfortunately most diagnoses happen in later stages, thus resulting in low chances of survival for the patient. So for early detection and prognosis, it is necessary to detect the benign or threatening nature of the bumps. In this paper, Artificial Neural Networks (ANN) and Decision Tree (DT) classifiers are used to develop a machine learning (ML) model using the Wisconsin diagnostic breast cancer (WDBC) dataset, in order to evaluate the attributes of a breast cancer development at beginning phases and classify it as malignant or benign. In the proposed scheme, feature selection and feature extraction are done to extract statistical features from the dataset and comparison between the models is provided based on their performance to identify the most suitable approach for diagnosis. The dataset apportioned into various arrangements of train-test split. The presentation of the framework is estimated, depending on accuracy, sensitivity, specificity, precision, and recall. The binary classification problem achieved a maximum accuracy of 98.55%.

METHODOLOGY

This section describes the binary classification problemand the ML algorithms used: ANN and DT for the task ofclassification of breast cancers. The proposed methodology is divided into four sections. The first section is the data assortment and source which is trailed by feature extraction and feature selection. Feature extraction is a process that increases the accuracy of the learned model by extracting features from the input data. It aims to lessen the quantity of features in a dataset. Feature selection is the process where the features are consequently or manually chose which contribute most of the prediction variable in which we are interested in. It aims at creating an accurate predictive model. Then comes the main section which describes the applications of ML algorithms used and finally performance evaluation is reported in the last section.

Data collection and source

The data set used in this paper is the WDBC data set. This dataset consists class division of breast cancer diseases as malignant and benign. Features are processed from a digitized picture of a fine needle aspirate (FNA) of a breast mass. There were ten real-valued features which were computed for each cell nucleus. They were: perimeter, area, radius, texture, smoothness, concave points, symmetry, fractal dimension, concavity and compactness.

Feature extraction and selection

The proposed model is tested on the basis of two ML algorithms: ANN and DT. For ANN, we have used LabelEncoder for extraction of features from the existing data set. Here, LabelEncoding is performed to encode our variable to numbers. It refers to converting the labels to numerical form for making it machine readable. ML algorithms can then decide in a better way on how those labels must be operated. The approach worked reasonably well with the ANN model. For DT, we used and compared three methods for feature selection. The three methods are listed below:

(i) No Feature Selection - Initially we evaluated the model with no feature selection to see how the model performs and calcutated the mean and worst of the ten features composed for each cell nucleus.

(ii) Features that are not correlated - During no feature selection, we found that there were many features that were correlated. In the model, features namely radius, compactness, concavity, smoothness, concave points, perimeter and fractal dimensions were found to be correlated. So these features were eliminated.

(iii) PCA transformation -Principal Component Analysis (PCA) change, is a dimensionality decrease strategy that is frequently used to lessen the dimensionality of huge data collections, by changing a huge set of variables into a more modest one that actually contains the greater part of the information in the enormous set. In this model we used PCA transformation to select features and reduce feature correlation.

Application of ML Algorithms

In the proposed model, two ML algorithms were used:ANN and DT.

(i) Artificial Neural Network (ANN) -ANN is a computational model dependent on the basis of structural elements of a biological neural network. ANNs can be used in binary classification problems where only a single output neuron using the logistic activation function: the output will be a binary number where the estimated probability of the positive class can be interpreted. A single neuron, known as perceptron, consists of a layer of inputs (corresponding to columns of a dataframe), where each input has a weight which controls its magnitude for a weighted summation, which is in turn fed to the activation function. In our classifier, we utilized densely connected neural network of four layers with one input output layer and two hidden layers with a Rectified linear circuit (ReLu) activation. The classifier is made on a sequential basis. This is the most straightforward keras model for neural networks. We include a dense hidden layer with 16 neurons. Each dense layer deals with its own weight matrix and contains all the association weights of the neurons and their sources. It also aditionally deals with a vector of bias terms (one per neuron). The initiation work ReLu just potrays the positive part of the contention as the negative part of the contention is zero. This model has low latency as it involves least layers and least channels per layer. Input dimension portrays the quantity of nodes in the input layer. The output dimension of each hidden layer recoils as we continue further in the network. As it is a binary classification, we have utilized sigmoid activation in the last hidden layer. Results go improved by setting units to 16 at the input layer and reducing the units in the hidden layers. As the classification is binary, so binary crossentropy loss function is used with softmax activation. RMSprop optimizer is being utilized to prepare the classifier as it confines the oscillation in a vertical way and calculation could make bigger strides in the horizontal direction converging quicker. A total of 785 parameters are tested which gives the model a lot of flexibility to fit the training data.

(ii) Decision Tree (DT) - DT is a fundamental component of RF. DT uses a layered splitting process, where at each layer the information data is split into two or more groups so that elements of the same group are homogenous to each other. The root node of the DT considers whether the mean area is smaller than 696.25 at depth 0, which would imply that the class is benign. There can be two possibilities: True or False. If it is true, then DT moves downside to the root’s left child node. Here in the same manner, it checks the mean symmetry is lesser than 0.202 and the class is benign. Similarly if the parent node is false then the DT moves downside to the root’s right child node. A node’s sample property checks the number of training samples it applies to. In the proposed mode, from the parent node it is seen that there are 455 samples which has a mean area of less than or equal to 696.25. Out of these 455 samples, 133 training instances have a mean area of greater than 692.5. Here 0 applies to benign and 1 applies to malignant. In the same manner, the total structure of DT is formed. Finally, a node’s Gini attribute measures it’s impurity. A node is pure (Gini score equal to 0) is all training instances it applies belongs to the same class.

CONCLUSION

In this project, WDBC data set is being utilized to classify breast cancer by utilizing two well-known ML frameworks - ANN and DL. In the proposed classifiers, feature selections are done to remove statistical features from the data set and comparison between the models is given dependent on their performance to determine the most appropriate methodology for conclusion. In ANN, label encoder is utilized, according to which the levels of categorical features are encoded into numeric values of 0 and 1. In DT, three strategies for feature scaling is applied to the data set for statistical scaling of features. In this two calculations, ANN outperformed DT by accomplishing accuracy of 98.55%. We utilized feature extraction technique to improve the prediction performance and ensure faster predictions. Also, in ANN, RMSprop optimizer is being used in place of traditional Adam optimizer which provides a greater learning rate and allows the algorithm to take greater strides in the horizontal direction converging faster. Future work can be coordinated towards forming the chosen approach into a likely practical strategy for supporting and helping specialists with brisk assessment in diagnosing breast cancer.

About

Breast cancer is one of the commonest cause of cancer deaths in women. It starts developing when threatening bumps start forming from the breast cells, and unfortunately most diagnoses happen in later stages, thus resulting in low chances of survival for the patient. So for early detection and prognosis, it is necessary to detect the benign or threatening nature of the bumps. In this paper, Artificial Neural Networks (ANN) and Decision Tree (DT) classifiers are used to develop a machine learning (ML) model using the Wisconsin diagnostic breast cancer (WDBC) dataset, in order to evaluate the attributes of a breast cancer development at beginning phases and classify it as malignant or benign. In the proposed scheme, feature selection and feature extraction are done to extract statistical features from the dataset and comparison between the models is provided based on their performance to identify the most suitable approach for diagnosis. The dataset apportioned into various arrangements of train-test split. The presentation of the framework is estimated, depending on accuracy, sensitivity, specificity, precision, and recall. The binary classification problem achieved a maximum accuracy of 98.55%.