download_ncbi_associations() fails while decompressing file
msbentsen opened this issue · comments
Hi,
Thank you for this great package! It has worked for me in the past, but lately I get an error when trying to download the NCBI associations as seen here:
from goatools.base import download_ncbi_associations
file_gene2go = download_ncbi_associations()
This produces the error:
FTP RETR ftp.ncbi.nlm.nih.gov gene/DATA gene2go.gz -> gene2go.gz
gunzip gene2go.gz
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-4-001d0dcec111> in <module>
1 # Get ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
2 from goatools.base import download_ncbi_associations
----> 3 file_gene2go = download_ncbi_associations()
~/.conda/envs/py3/lib/python3.7/site-packages/goatools/base.py in download_ncbi_associations(gene2go, prt, loading_bar)
131 file_remote = "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/{GZ}".format(
132 GZ=os.path.basename(gzip_file))
--> 133 dnld_file(file_remote, gene2go, prt, loading_bar)
134 else:
135 if prt is not None:
~/.conda/envs/py3/lib/python3.7/site-packages/goatools/base.py in dnld_file(src_ftp, dst_file, prt, loading_bar)
221 if prt is not None:
222 prt.write(" gunzip {FILE}\n".format(FILE=dst_wget))
--> 223 gzip_open_to(dst_wget, dst_file)
224 except IOError as errmsg:
225 import traceback
~/.conda/envs/py3/lib/python3.7/site-packages/goatools/base.py in gzip_open_to(fin_gz, fout)
233 with gzip.open(fin_gz, 'rb') as zstrm:
234 with open(fout, 'wb') as ostrm:
--> 235 ostrm.write(zstrm.read())
236 assert os.path.isfile(fout), "COULD NOT GUNZIP({G}) TO FILE({F})".format(G=fin_gz, F=fout)
237 os.remove(fin_gz)
~/.conda/envs/py3/lib/python3.7/gzip.py in read(self, size)
274 import errno
275 raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276 return self._buffer.read(size)
277
278 def read1(self, size=-1):
~/.conda/envs/py3/lib/python3.7/gzip.py in read(self, size)
469 buf = self._fp.read(io.DEFAULT_BUFFER_SIZE)
470
--> 471 uncompress = self._decompressor.decompress(buf, size)
472 if self._decompressor.unconsumed_tail != b"":
473 self._fp.prepend(self._decompressor.unconsumed_tail)
error: Error -3 while decompressing data: invalid block type
It seems to be correctly downloading the .gz file, but reading it fails, and so the gene2go-file is empty:
If I use an old gene2go file, it works perfectly (I have one from 10.11.2020 which works), but it seems that any new download fails.
I am running python==3.7.6
and goatools==1.1.6
on a Debian system.
Thank you for any help you might be able to provide for solving this!
Thank you for using GOA TOOLs in your day-to-day work and for taking your time to write us.
I have augmented the test, tests/test_i147_all_taxids.py
so that it always downloads NCBI's gene2go annotation file for better testing, but am not able to duplicate what you are seeing. So we need more information.
In the meantime, here are a couple things to try:
1. Include the full name of the gene2go file you are downloading; here is an example:
from os import getcwd
from os.path import join
from goatools.base import download_ncbi_associations
fin_anno = join(getcwd(), 'gene2go')
download_ncbi_associations(fin_anno)
2. Download the gene2go file by hand
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
$ gunzip gene2go.gz
Hi, thank you for getting back to me. I tried the second option, and I think it might be a system-specific issue on my end. I get an "invalid compressed data--format violated" error from gunzip, but I was able to download it from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
and unzip without issue. So probably something to do with restrictions on downloading from ftp - not quite sure. But my problem was solved, thank you!