shenwei356 / gtdb-taxdump

GTDB taxonomy taxdump files with trackable TaxIds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some ranks are being skipped

apcamargo opened this issue · comments

I didn't do any systematic evaluation, but I found at least one lineage that skips the family level:

taxon = taxopy.Taxon(619996715, taxdb)
print(taxon)
# s__Bacteria;p__Patescibacteria;c__Microgenomatia;o__2-02-FULL-39-11;g__2-02-FULL-40-10;s__2-02-FULL-40-10 sp001779115
$ taxonkit lineage -R --data-dir R207 problematic_taxids.txt
619996715	Bacteria;Patescibacteria;Microgenomatia;2-02-FULL-39-11;2-02-FULL-40-10;2-02-FULL-40-10 sp001779115	superkingdom;phylum;class;order;genus;species

This seems to be the case of ranks that have the same name, as described in the help dialogue. I hadn't seen the explanation there.

@shenwei356 do you think it could be useful to add prefixes (e.g. p__, c__, etc) to the taxa names to make sure all the ranks will have anode?

No need I think. It's common in viral species.