marbl / Krona

Interactively explore metagenomes and more from a web browser.

Home Page:https://github.com/marbl/Krona/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Accessions not found in local database

jw51 opened this issue · comments

commented

I recently updated my nr and krona tools databases and when I used ktClassifyBLAST to classify my blast results I got a warning that hundreds of accession numbers were not found in my local database. I checked a few and they all seem to correspond to TPA proteins (third party annotations) coming from the same paper (A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Parks et al 2018) and have not been recently added to NCBI ... However, they are not found in the earlier version of my databases. I probably don't think it is related to krona tools itself, as the accession numbers are indeed not found in the all.accession2taxid.sorted file... Did NCBI only recently added these proteins to the nr database, but not to the prot.accesion2taxid.gz file? Anybody else experiencing the same issue?

Hi jw51!
I'm experiencing the same issue.
I've dowloaded the latest version of KronaTools and I've updated the Taxonomy and the Accessions. The local database is missing a lot of Accessions, that I supsected correspond to TPA proteins. Also protein of different experiments are not present.
For example, the accession WP_121469408.1 (https://www.ncbi.nlm.nih.gov/protein/WP_121469408.1), which is an hypotetical protein associated to the taxID 1560005 (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1560005) is not present in all.accession2taxid.sorted file

I tried with:

grep "WP_121469408.1" all.accession2taxid.sorted

but the accession is not in the file.
It's a problem, because I cant classify a lot of my sequences

commented

I haven't found a solution yet...

It's strange, because if I download the file "prot.accession2taxid.gz" directly from the NCBI ftp ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/

And I write:

zgrep "WP_121469408.1" prot.accession2taxid.gz

I can find the accession in the file.

Consequently, in the NCBI database the accession is reported. I have no idea why it is not present in all.accession2taxid.sorted after running UpdateAccession.sh.

If you can provide to me some accession of TPA proteins I can check if they are present in the "prot.accession2taxid.gz" directly dowloaded from the NCBI

commented

Hi, sorry for the delay.
Some TPA examples include: HCN16260, HCF14441, HAZ50255, HAM16036.

commented

Hi, I'm still having the same issue.

I downloaded an novel nr-database and Krona database, installed KronaTools again and installed KronaTools through conda.
But nothing solved the issue.

Could anybody confirm that they are NOT having this issue with a recent Krona database? Or are we really the only ones...

@MetaDan89 what are some of the others missing? WP_121469408.1 should be there, but the accessions are stored without the version (the .1 suffix) if you're trying to grep. Try:
>ktGetTaxIDFromAcc WP_121469408.1
1560005

If the missing ones are all TPAs that are not in the NCBI tax lookup, then there's not a lot that can be done about them unfortunately.

commented

Here are some other ones: NAH76232 EFG1045630 MSM27211 EEZ5301706 EEW2049631 EFJ9541260 EES1712301 MTV91234. None of them are TPAs and their origin is known (species is in the name), so I don't really see why there can't be a taxid for them?

Is there anyway for ktClassifyBLAST to drop these hits without taxid? As all of my sequences become "root" if they have hits like this, while they still have good hits with taxonomic identifiers as well...

These hits are now dropped by default in the latest code (see #150). Official release coming soon.

This is now in a release (v2.8). Also see this post in #150 for a temporary database fix that will allow all the above accessions to be found. NCBI will likely move these into the main database soon.

Dear Developers,

This is to let you know, I am experiencing a similar problem. I am using Krona 2.8.1 and below is the message I got

[ WARNING ] The following taxonomy IDs were not found in the local database and were set to root
(if they were recently added to NCBI, use updateTaxonomy.sh to update the local
database): 155 116 383 12 5 18029 18232 3 17051 36 58 15 16726 17020 17754 17989 4 17292
66 70
Writing krona.html..

Please advice. Thanks