nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2

Home Page:https://nextstrain.org/ncov

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Visualise omicron recombinants

jameshadfield opened this issue · comments

This issue contains two examples of visualising recombinants.

For this issue I've chosen to display BA.1, BA.2, BA.4 and BA.5 as distinct recombinants, however that choice is clearly up for grabs. What workflow changes we make to generate them is also up for grabs, here I'm using a tree from Trevor's #913 and pruning out clades.

Option 1: Separate them into subtrees

URL: https://nextstrain.org/staging/omicron-recombinant/subtrees

image

Option 2: Add a "potential recombinant" colouring so you can explode them

URL: https://nextstrain.org/staging/omicron-recombinant/explodable?c=recombinant

image

cc @huddlej @rneher @trvrb @emmahodcroft

What do we think is the most biologically appropriate representation of these data in a data structure?

  • Encoding the potential recombinants as subtrees indicates strong evidence that the subtrees do not belong to a single tree. This encoding enforces a specific set of recombination events.
  • Encoding the recombinants as a metadata annotation that we can "explode" or color by indicates uncertainty about the nature of the recombination events. This encoding allows multiple different possible annotations of recombination events.

After working with @nicfel's recombination networks a little bit and seeing the complexity and uncertainty there, I prefer the less stringent encoding of events by metadata instead of by subtrees.

Related to this, should we plan to standardize on the same approach for reassortment that we use for recombination? For example, encoding MCCs for H3N2 from TreeKnit as metadata annotations also makes sense, given the variability of the MCC annotations with different runtime parameters.

I agree with John and also think, that at least for the BAs, encoding as metadata that can be exploded or not is nice to reflect the uncertainty.

One question it raises is whether in cases where recombination is more certain (the X lineages), that would be better to have as a separate tree. How much it influences the phylogeny may be a line-drawing point - 'more recombinant' things may just simply not fall well at all into a conventional phylogeny.

If that happened, would we be able to and/or want to do both? In this example, have BAs as metadata to explode by choice, but have X lineages 'always exploded'? 🤔