sacdallago / nCoV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2019-nCoV

In this repo I collected various files that may be useful for the community / those interested in the 2019-nCoV, aka. the novel CoronaVirus (also known as Wuhan coronavirus, or Wuhan seafood market pneumonia virus). If you have any questions contact me on TW: @sacdallago

Files:

  1. All protein products translated from sequenced genomes on NCBI (from NCBI protein) as a single FASTA file. This resource is updated regularly: files/nCoV.fasta. Additionally, a TSV file with annotation on which isolates the sequences stem from: files/nCoV.txt

  2. The spike/surface glycoprotein translated from the first (Dec 19) sequenced genome of the virus (https://www.ncbi.nlm.nih.gov/protein/1791269090): files/nCoV_2019_glycoprotein.fasta

  3. All protein products from SARS (from NCBI protein) and MERS (from NCBI protein):

  4. All protein products that match "spike glycoprotein" on NCBI (from NCBI protein): files/spike_glycoproteins.fasta

  5. Spike glycoproteins reduced to 90% similarity via CD-HIT: files/cd-hit/spike_glycoproteins_90.fasta

  6. Multiple alignments (via jackhmmer and evcouplings) of the dec 2019 nCoV sequenced spike glycoprotein (see above) against all spike glycoproteins found on NCBI (see above): maintenance.dallago.us/public/ncov/alignments.

Data & tools:

  • RAW data from NCBI
  • Scripts to download data from NCBI in this repository
  • jackhmmer was used to produce alignments
  • evcouplings via the evcouplings.org server and the evcouplings python pipeline
  • cd-hit

About

License:GNU General Public License v3.0


Languages

Language:Python 69.0%Language:Shell 31.0%