Detecting-Respiratory-Diseases-from-Recorded-Lung-Sounds-by-2D-CNN

A. Data Pre Processing- The data set used in this project is collected from ICBHI challenge. The recordings are collected from 126 patients where there are a total of 920 recorded samples. The recordings were recorded from the patients via Littman 3200 electronic stethoscope and Littman classic II SE stethoscope. The data set contains 6 types of data- bronchiectasis, pneumonia, bronchiolitis, COPD, URTI, and healthy.

B. Feature Extraction- Two types of techniques named data normalization and data augmentation were applied to the lung sound database to extract spectrogram features from the generated audio files. Here data augmentation was employed in the form of audio stretching, i.e., speeding up and speeding down. During data augmentation, iteration through each sound file and extraction of feature was done using MFCC. These spectrogram features were passed to the 2D CNN for further classification.

C. Training and Testing- After feature extraction, these spectrogram features are given to the classifier for further classification. The classifier consisted of CNN which is used to identify bronchiectasis, pneumonia, bronchiolitis, COPD, URTI, and healthy category of sounds from the MFCC features extracted from the audio samples. The classifier consists of different convolution layers and max-pooling layers which are then followed by activation and fully connected layers. A sequential model consisting of four 2D convolutional layers and the dense layer as output is used. The convolutional layers are designed for feature detection. They work by sliding a filter window over the input and performing a matrix multiplication and storing the result in a feature map. During the forward pass, the filters are convolved between the height and width of the inputs. It produces a 2D block which consists of the dot product of the height and widths. The pooling layer is another building block of CNN. It reduces the spatial size of the representation to reduce the number of parameters. Pooling is of two typesaverage pooling and max pooling. To prevent overfitting, dropout layer is used after each convolutional layer. Rectified linear unit (ReLu) is used as an activation function in each convolutional layer to introduce non-linearity from the input to the output. At the top, there are two fully connected layers with softmax as an activation function for the output layers. The input layer is taken in the form of sample height, sample width and number of filters. The filter parameter specifies the number of nodes in each layer. The number of filters in each convolutional layer is chosen to be 16, 32, 64, and 128 respectively. The kernel size parameter specifies the size of the kernel window, which is 2, resulting in a 2x2 filter matrix. Each convolutional layer has an associated pooling layer of max-pooling 2D type with the final convolutional layer having a global average pooling 2D type. The pooling layer is to reduce the dimensionality of the model (by reducing the parameters and subsequent computation requirements) which serves to shorten the training time and reduce over fitting. The max-pooling type takes the maximum size for each window and the global average pooling type takes the average for over whole frame. A dropout value of 20% is used in the CNN layers. The final output layer consists of 6 neurons which corresponds to six different classes- bronchiectasis, pneumonia, bronchiolitis, COPD, URTI, and healthy.

About

Respiratory disease is among the leading causes of deaths around the world. A large amount of population is being affected regularly with some kinds of lung function disorders which eventually lead to respiratory diseases. Prevention and early detection are essential steps in managing respiratory diseases. To decrease the fatality, an efficient detection model is needed. In this paper, 2D convolutional neural network (CNN) is used to detect respiratory diseases from the recorded lung sounds at early stages. The proposed method can detect respiratory diseases like bronchiectasis, pneumonia, bronchiolitis, chronic obstructive pulmonary disease, upper respiratory tract infection, and healthy by using Mel-frequency cepstral co-efficients (MFCC). In the proposed scheme, a data frame is recorded and after extracting the statistical features from the audio clips, the data is loaded in the data frame where further classification is done using 2D CNN. The model is based on 2D CNN architecture where the number of layers is reduced to a certain extent to achieve more accuracy. The proposed model has only 13 CNN layers where each convolution layer is being associated with a pooling layer of max-pooling 2D type. The final convolution layer has a global-average pooling 2D layer. The proposed method obtained an accuracy of over 92.39%.

Languages

Language:Python 100.0%