nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page:https://clades.nextstrain.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

if the qc.overallStatus of my sequences are mediocre, can we keep them for next step analysis?

liamxg opened this issue · comments

commented

Dear @nextclade team,
if the qc.overallStatus of my sequences are bad, should we remove them for next step analysis?

It depends a lot on what you're doing, it also depends on what particular QC rule makes it bad. It could also be that the sequence is perfectly fine just a recombinant.

In general, bad QC just means something potentially bad or interesting might be happening and you should have a closer look if you sequenced it.

commented

Dear @corneliusroemer,
First, if we check the sequence quality, should we see the qc.overallStatus column?

commented

Dear @corneliusroemer,
All of them are download from GISAID. I just upload them to nextclade, and check the nextclde.tsv file.

Dear Liam @liamxg,

Yes, the QC overall status (derived from overall QC score) is an empirical metric which gives you some idea of quality of the genome, according to the beliefs of our team. You can learn more about QC in the documentation and/or by inspecting source code.

QC subsystem is configurable in the dataset (in qc.json file for v2 or in pathogen.json file for v3), so that you can customize it to your needs. Finally, you can implement your own metrics using Nextclade's analysis results or even using aligned sequences.

There is no absolute metric that would tell you what you "should" or "should not" do. Not in Nextclade, not anywhere else. As Cornelius mentioned, Nextclade only tries to attract attention to certain (not all) issues that it detected. The final judgement is yours, and it depends on the goals of your particular research project.

commented

Dear @ivan-aksamentov,

Thanks. That's very helpful.