shenwei356 / gtdb-taxdump

GTDB taxonomy taxdump files with trackable TaxIds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to build Diamond-database with the taxonomy files

emilhaegglund opened this issue · comments

Thanks for creating these taxonomy files. I was trying to use the files from R207 to build a Diamond database, however it failed when reading the names.dmp with the following error message: Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage..

I was just wondering if you have tried this taxonomy files with Diamond, and if you had any success?
Cheers, Emil

Seems it needs a lot of memory which is not available in your machine.

Strange, I'm on a machine with 128Gb Ram and building with the NCBI taxdumps where no problems. Will check with the Diamond-developers then.

Thanks,
Emil

GTDB taxonomy has 47894 species in r202 which belong to 28073+ species of NCBI taxonomy. Is this the cause?

Hi, I asked the developers of Diamond if they had any clue. The reason is that the taxids in GTDB-taxdump is larger than 2^31, which is not supported in Diamond. See bbuchfink/diamond#611 (comment). Will see if they can create a fix for this issue.

Thanks for the quick replies.

It could happen, I use 'uint32' to store taxids.

I plan to use int32.