nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page:https://clades.nextstrain.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ENH: Show private aa substitutions in mutations tooltip

corneliusroemer opened this issue · comments

The private mutations tooltip (labeled, reversions, unlabeled) is extremely useful. It currently only shows nucleotide substitutions. I think this is partially because nucleotide substitutions are "simpler" and usually more informative for QC than aa substitutions, also not all nucleotide mutations cause aa substitutions. So it makes sense that we use nucleotide substitutions as the fundamental unit in the mutations tooltip.

However, when reviewing SARS-CoV-2 sequences for new developments (not for QC purposes), it is usually the amino acid substitutions that are of primary interest. I often find myself having to look up which aa substitution (if any) a particular new nucleotide substitution corresponds to. I've by now learned a few nt substitutions of by heart (e.g. S:346T is 22599) but it's not an ideal solution to require memorization/lookup.

It would be great if we could also report aa substitutions, besides nt substitutions.

There are a few design decisions to be made but it shouldn't be impossible. One idea would be to have two mutation columns: one for nucleotides and one for amino acids. Right now, we show both together, but only nt have the "private" feature. This would give us more space to show more nt mutations (sometimes truncated, see below). A reason to report nt and aa together is that we could match nt and aa substitions. But we don't do this currently (and it's not trivial).

An alternative way to create more space would be to report "all mutations" and "private mutations" in different columns. A lot of space is taken up by "all mutations" - that way aa and nt could stay together, just grouped by topic.

Having a separate column (whether "aa" in addition to existing "nt" or "private" in addition to "all") would also add a new way to sort the table which could be useful.

Example of current mutation tooltip:
image

I think this should be relatively straightforward to add, the find_private_nuc_mutations.rs is mostly the same as find_private_aa_mutations.rs, except AA version runs multiple times for each gene.

Private AA subs and dels are already calculated, so we just need to write a UI for them.

Labelling of mutations can also be implemented if needed, but that would require additional data in virus properties JSON.