kevinmastascusa / CORD_19_Research

"KZM COVID Informatics: A repository for data analysis and insight extraction from the CORD-19 dataset, focused on advancing our understanding of the COVID-19 pandemic."

Home Page:https://kevinmastascusa.github.io/CORD_19_Research/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CORD_19_Research

Project Name: Data Mining with Python - CORD-19 Research Challenge

Description

This project leverages the COVID-19 Open Research Dataset (CORD-19).

The CORD-19 dataset is a free resource of over 400,000 scholarly articles, both with and without full text, about COVID-19 and the coronavirus group of viruses. It was released to encourage the global research community to use recent advances in AI and data analysis to generate new insights in the fight against the COVID-19 pandemic.

How to Use

main.py is the main file to run. It will run the entire project.

Installation

Python 3.7.6 is required to run this project. The following packages are required to run this project:

  • pandas
  • numpy
  • matplotlib
  • nltk
  • sklearn

Project Structure

metadata.csv - This file contains metadata for all of the articles in the CORD-19 dataset. The metadata includes the title, authors, abstract, publication date, and other information about the articles.

I cannot upload this file to Github because it is too large. You can download it from the following link: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/download

Notebook.ipynb - This file contains test code for the project. It is a Jupyter Notebook file.

Results

Once experiments have been completed and results have been generated, they will be stored in the results directory. The results directory will contain a subdirectory for each experiment. Each experiment subdirectory will contain a results.csv file that contains the results of the experiment. The results.csv file will contain the following columns:

Column Name Description
Experiment The name of the experiment
Model The name of the model
Accuracy The accuracy of the model
Precision The precision of the model
Recall The recall of the model
F1 The F1 score of the model

Contributing

Project is not open to contributions at this time. This project is an individual project for a class.

License This project uses the following license: ???

Contact

Author

Results

Google Docs

(One Drive With Data) Note: Please consult the latest and most authoritative health sources like the World Health Organization or the Centers for Disease Control and Prevention for current, scientifically-validated information about COVID-19.

About

"KZM COVID Informatics: A repository for data analysis and insight extraction from the CORD-19 dataset, focused on advancing our understanding of the COVID-19 pandemic."

https://kevinmastascusa.github.io/CORD_19_Research/


Languages

Language:Jupyter Notebook 74.4%Language:Python 25.2%Language:R 0.4%