nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2

Home Page:https://nextstrain.org/ncov

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

representation of a nuc conversion do not point a conversion on the genome

huzuner opened this issue · comments

Hello,

For our research, I have been using resources from Nextstrain and I encountered an ambiguity that I wanted to ask about.

I am currently working on the tabular file for nuc conversions belonging to Nextstrain clades (https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv). I have been assuming that the clades.tsv represents nucleotide conversions, so they should inform about a nucleotide change compared to the reference genome.
However, I noticed that one of the conversions listed there, 22F nuc 15461 A, already points to A on the reference genome (NC_045512v2). Does this have an intended purpose or do I misunderstand the concept?

Thank you in advance for your response.

Yes it looks like there are no (or very few) mutations observed at this position. It may have been a typo. Using all 4 defining positions we can see how they identify 22F.

Note that these definitions are the ones we use for augur clades to define clades on a tree. They're not intended to be used for individual sequence classification. We recommend using nextclade for classification of sequences.

Thanks for your answer!

What is the difference between augur clades and nextstrain clades? Are they not same or similar?

In my case, I need either full fasta sequences or nucleotide changes of Nextstrain/Nextclade clades so that I can create
fasta sequences myself. I did not find any of this in 'Nextclade datasets'. The only resource that is relevant is 'tree.json' but we identified some inconsistencies for some clades when compared with covariants.org.
That's how I ended up with this 'clades.tsv', assuming that it provides the actual nucleotide differences belonging to clades.

cc @corneliusroemer - I think you have a fasta of representative sequences for each lineage?

cc @corneliusroemer - I think you have a fasta of representative sequences for each lineage?

Would be great to hear if this exists.