trung-hn / covid-19

WIP: tree visualization of covid-19 virus sequences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Directory tree

.
├── raw_data
│   ├── COVID-19-data-linked - Shortcut.lnk
│   └── gisaid_cov2020_sequences.fasta
├── src
│   ├── Preprocessing.ipynb
│   ├── bubbleTree.py
│   ├── circularTree.py
│   ├── compareTree.py
│   ├── nodeStyle.py
│   ├── nodeStyleColored.py
│   ├── plainTree.py
│   ├── semi_tree.pdf
│   ├── treeDrawingEngine.py
│   └── treeInTree.py
└── trees
    ├── gisaid_cov2020_sequences_filtered_8312_age.nwk
    └── gisaid_cov2020_sequences_filtered_8312_age_country.nwk
    

Notes: raw_data directory is git-ignored

How to start

pip3 install -r requirements.txt

Preprocessing

From Raw Data (GISAID fasta and nextstrain metadata), you can use Preprocessing.ipynb to filter out those useful data. In this case, we set it to host: human with known age, gender, country, and submitted date


Raw Data:

Raw data from GISAID: https://www.epicov.org/epi3/frontend#61101

Nextstrain GISAID metadata: https://github.com/nextstrain/ncov

Nextstrain GISAID metadata is now avalable on GISAID: more instruction here

More resources for tree visualization:

About

WIP: tree visualization of covid-19 virus sequences


Languages

Language:Jupyter Notebook 66.7%Language:Python 33.3%