In this repo I collected various files that may be useful for the community / those interested in the 2019-nCoV, aka. the novel CoronaVirus (also known as Wuhan coronavirus, or Wuhan seafood market pneumonia virus). If you have any questions contact me on TW: @sacdallago
-
All protein products translated from sequenced genomes on NCBI (from NCBI protein) as a single FASTA file. This resource is updated regularly: files/nCoV.fasta. Additionally, a TSV file with annotation on which isolates the sequences stem from: files/nCoV.txt
-
The spike/surface glycoprotein translated from the first (Dec 19) sequenced genome of the virus (https://www.ncbi.nlm.nih.gov/protein/1791269090): files/nCoV_2019_glycoprotein.fasta
-
All protein products from SARS (from NCBI protein) and MERS (from NCBI protein):
- files/SARS.fasta
- Isolate annotation: files/SARS.txt
- files/MERS.fasta
- Isolate annotation: files/MERS.txt
-
All protein products that match "spike glycoprotein" on NCBI (from NCBI protein): files/spike_glycoproteins.fasta
-
Spike glycoproteins reduced to 90% similarity via CD-HIT: files/cd-hit/spike_glycoproteins_90.fasta
-
Multiple alignments (via
jackhmmer
andevcouplings
) of the dec 2019 nCoV sequenced spike glycoprotein (see above) against all spike glycoproteins found on NCBI (see above): maintenance.dallago.us/public/ncov/alignments.
- RAW data from NCBI
- Scripts to download data from NCBI in this repository
jackhmmer
was used to produce alignmentsevcouplings
via the evcouplings.org server and the evcouplings python pipelinecd-hit