COVID19 detection Research

Detection of COVID19 through voice using Neural Networks.

This work is part of a Masters Thesis submitted in partial fulfillment for the degree of Master of Science in Data Science for Worcester Polytechnic Institute

Dataset
Architectures
Setup
Usage
Future work
- Data Representations
- New Architecture try outs

Dataset

We work on audio samples collected from voca.ai and Coswara. Audio samples from both the datasets for combined and a 80-20 train-test stratified split is created. Below is the number of samples in each dataset.

	Dataset	Voca.ai	Coswara
Cough Samples	Covid +ve	1950	105
	Covid -ve	39	1361
Breath Samples	Covid +ve	-	103
	Covid -ve	-	1366
Alphabet Samples	Covid +ve	29	-
	Covid -ve	1751	-

Architectures

Below are the architectures tried. All the files are under networks folder.

Networks	AUC
Convolutional Neural Networks(convnet)	0.56
Conv Auto Encoders(cae)	0.57
Variational Auto Encoders(vae)	0.65
Contrastive Learning methods(contrastive)	0.63
Brown et al.(Vggish + SVM)	0.61

Setup

Download and run the requirements.txt to install all the dependencies.
```
pip install -r requirements.txt
```
Create a config file of your own

Usage

Data generation

Run data_processor.py to generate data required for training the model. It reads the raw audio samples, splits into n seconds and generates Mel filters, also called as Filter Banks (fbank paramater in config file. Other available audio features are mfcc & gaf)

python3 covid_19/datagen/datadata_processor.py --config_file covid_19/configs/<config_filepath>

Training the network

Using main.py one can train all the architectures mentioned in the above section.

python3 main.py --config_file covid_19/configs/<config_filepath> --network convnet

Inference

python3 main.py --config_file --test_net True <config_filepath> --network convnet --datapath <data filepath>

Remember to generate mel filters from raw audio data and use the generated .npy file for datapath parameter

ShreeshaN / COVID_19

COVID19 detection Research

Table of Contents

Dataset

Architectures

Setup

Usage

Data generation

Training the network

Inference

Future work: TODO

Improve on Data Representations

Try new architectures

About

Languages