rpetit3 / fastq-dl

Download FASTQ files from SRA or ENA repositories.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ModuleNotFoundError: No module named 'executor'

ConYel opened this issue · comments

Hello and thank you for this useful tool!
I have made a docker with the installation of fastq-dl
and I tried to run one example but it seems is not working.
the dockerfile can be found here: https://github.com/ConYel/docker_sncRNA_workflow/blob/master/Dockerfile

Do you believe there are some issues with the installation or the way the container is created?

fastq-dl SRX477044 ENA
2020-12-16 14:46:36:root:INFO - Aspera Connect not available, using FTP for ENA downloads
2020-12-16 14:46:37:root:INFO - Query: SRX477044
2020-12-16 14:46:37:root:INFO - Archive: ENA
2020-12-16 14:46:37:root:INFO - Total Runs To Download: 1
2020-12-16 14:46:37:root:INFO - 	Working on run SRR1178104...
Traceback (most recent call last):
  File "/root/miniconda/bin/fastq-dl", line 462, in <module>
    fastqs = ena_download(run, outdir, aspera=aspera,
  File "/root/miniconda/bin/fastq-dl", line 216, in ena_download
    fastq = download_ena_fastq(
  File "/root/miniconda/bin/fastq-dl", line 250, in download_ena_fastq
    execute(f'mkdir -p {outdir}')
  File "/root/miniconda/bin/fastq-dl", line 94, in execute
    from executor import ExternalCommand, ExternalCommandFailed
ModuleNotFoundError: No module named 'executor'

Let's see if we can get this figured out! I think first thing to check is to see is if the conda install of Python3 is being used since that's where executor is installed.

Try:

which fastq-dl

which python3

Both of these should be in /root/miniconda/bin/ (based on error message above)

Thank you for the prompt reply!

root@111:/home# which fastq-dl
root@111:/home# which python3
/usr/bin/python3

I actually didn't get any results on the fastq-dl

ls /root/miniconda/bin/py
pydoc             pydoc3            pydoc3.8          python            python3           python3.8         python3.8-config  python3-config 

I'm guessing its a PATH issue. What does $PATH look like?

echo $PATH

root@111:/home# echo $PATH ~/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Is fastq-dl in /root/miniconda/bin?

There is:
root@111:/home# ls /root/miniconda/bin/f f2py fasterq-dump fasterq-dump-orig.2.10.8 fastq-dump.2 fax2ps fc-conflist fc-query freetype-config f2py3 fasterq-dump.2 fastqc fastq-dump.2.10.8 fax2tiff fc-list fc-scan futurize f2py3.8 fasterq-dump.2.10.8 fastq-dl fastq-dump-orig fc-cache fc-match fc-validate fastaFromBed fasterq-dump-orig fastq-dump fastq-dump-orig.2.10.8 fc-cat

Cool!

Let's give this a try:

/root/miniconda/bin/python3 /root/miniconda/bin/fastq-dl  SRX477044 ENA

I noticed the export to PATH in the dockerfile, but I'm wondering if might be worth adding this instead:

ENV PATH /root/miniconda/bin:$PATH

So I tried it multiple times but I get always:

root@111:/home# /root/miniconda/bin/python3 /root/miniconda/bin/fastq-dl  SRX477044 ENA

2020-12-16 16:04:07:root:INFO - Aspera Connect not available, using FTP for ENA downloads
2020-12-16 16:04:09:root:INFO - Query: SRX477044
2020-12-16 16:04:09:root:INFO - Archive: ENA
2020-12-16 16:04:09:root:INFO - Total Runs To Download: 1
2020-12-16 16:04:09:root:INFO - 	Working on run SRR1178104...
2020-12-16 16:04:09:root:INFO - 		FTP download attempt 1
2020-12-16 16:04:09:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:04:09:root:ERROR - Retry execution (1 of 10)
2020-12-16 16:04:19:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:04:19:root:ERROR - Retry execution (2 of 10)
2020-12-16 16:04:30:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:04:30:root:ERROR - Retry execution (3 of 10)
2020-12-16 16:04:40:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:04:40:root:ERROR - Retry execution (4 of 10)
2020-12-16 16:04:50:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:04:50:root:ERROR - Retry execution (5 of 10)
2020-12-16 16:05:00:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:05:00:root:ERROR - Retry execution (6 of 10)
2020-12-16 16:05:10:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:05:10:root:ERROR - Retry execution (7 of 10)
2020-12-16 16:05:20:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:05:20:root:ERROR - Retry execution (8 of 10)
2020-12-16 16:05:30:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
2020-12-16 16:05:30:root:ERROR - Retry execution (9 of 10)
2020-12-16 16:05:40:root:ERROR - "wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz" return exit code 8
Traceback (most recent call last):
  File "/root/miniconda/bin/fastq-dl", line 462, in <module>
    fastqs = ena_download(run, outdir, aspera=aspera,
  File "/root/miniconda/bin/fastq-dl", line 216, in ena_download
    fastq = download_ena_fastq(
  File "/root/miniconda/bin/fastq-dl", line 255, in download_ena_fastq
    execute(f'wget --quiet -O {fastq} {ftp}', max_attempts=max_attempts)
  File "/root/miniconda/bin/fastq-dl", line 126, in execute
    raise error
  File "/root/miniconda/bin/fastq-dl", line 104, in execute
    command.start()
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1441, in start
    self.start_once(**kw)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1508, in start_once
    self.wait(check=check)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1551, in wait
    self.check_errors(check=check)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1673, in check_errors
    raise self.error_type(self)
executor.ExternalCommandFailed: External command failed with exit code 8!

Command:
bash -c 'wget --quiet -O /home/SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz'


Regarding the path I have tried it like this in the past and didn't work properly for some cases I could try again and see if it will change something but it seems there's a different issue.

That's a different error though, so progress!

Looks like the service is unavailable at the moment:

http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz

Disregard that, its working fine that!

Ok looks like somethign has changed on ENA side.

In the past, this worked:

wget -O ./SRR1178104_1.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz

Now, ftp:// is needed for it to work:

wget -O ./SRR1178104_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR117/004/SRR1178104/SRR1178104_1.fastq.gz

As a temporary solution you could try downloading from SRA

/root/miniconda/bin/python3 /root/miniconda/bin/fastq-dl  SRX477044 SRA

I submitted a new release (v1.0.6) with a fix for ENA downloads. It will be available from conda today or tomorrow.

I also created a work flow (https://github.com/rpetit3/fastq-dl/runs/1565163292) to make sure downloads are working.

Thank you very much for your help!

Well, thank you very much! I will check tomorrow with a new build of the docker plus the ENV to see if it could help.

I rebuilt it finally, and it is probably working,
I don't know if it is too much to ask but would it be possible to
print something like the percentage of the file while downloading it, to know if it is actually running?
I have used it as before: fastq-dl SRX477044 ENA (now the $PATH seems to be working properly with your suggestion!)
and it was left on the:
2020-12-17 10:16:43:root:INFO - FTP download attempt 1
for ~10 mins so I interrupted it.
I saw that one of the files: SRR1178104_1.fastq.gz, was partially, probably, downloaded.
I check the file but it is 0 bytes so probably it was never downloaded?
Anyway, I tried then the SRA:

root@111:/home# fastq-dl SRX477044 SRA
2020-12-17 10:35:17:root:INFO - Query: SRX477044
2020-12-17 10:35:17:root:INFO - Archive: SRA
2020-12-17 10:35:17:root:INFO - Total Runs To Download: 1
2020-12-17 10:35:17:root:INFO - 	Working on run SRR1178104...
2020-12-17 10:39:39:root:ERROR - "pigz -p 1 -n --fast *.fastq" return exit code 25
Traceback (most recent call last):
  File "/root/miniconda/bin/fastq-dl", line 466, in <module>
    fastqs = sra_download(run["run_accession"], outdir, cpus=args.cpus,
  File "/root/miniconda/bin/fastq-dl", line 175, in sra_download
    execute(f'pigz -p {cpus} -n --fast *.fastq', directory=outdir)
  File "/root/miniconda/bin/fastq-dl", line 126, in execute
    raise error
  File "/root/miniconda/bin/fastq-dl", line 104, in execute
    command.start()
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1441, in start
    self.start_once(**kw)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1508, in start_once
    self.wait(check=check)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1551, in wait
    self.check_errors(check=check)
  File "/root/miniconda/lib/python3.8/site-packages/executor/__init__.py", line 1673, in check_errors
    raise self.error_type(self)
executor.ExternalCommandFailed: External command failed with exit code 25!

Command:
bash -c 'pigz -p 1 -n --fast *.fastq'

Standard error:
pigz: abort: write error on SRR1178104_1.fastq.gz (Inappropriate ioctl for device)
root@111/home# ls
download_SRA.sh  my_data  spar_prepare  SRR1178104_1.fastq  SRR1178104_1.fastq.gz  **SRR1178104_2.fastq**  STAR_sam_script.txt

root@111:/home# ls -la
total 2510848
drwxr-xr-x.  1 root root        122 Dec 17 10:39 .
drwxr-xr-x.  1 root root         56 Dec 17 10:14 ..
-rwx------.  1 root root       1111 Dec 17 09:56 download_SRA.sh
drwxr-xr-x.  2 root root         10 Dec 17 10:00 .empty
drwxr-xr-x. 21 1000 1000       4096 Dec 14 12:18 my_data
drwxr-xr-x.  4 root root       4096 Dec 17 10:03 spar_prepare
-rw-r--r--.  1 root root 1285544424 Dec 17 10:39 SRR1178104_1.fastq
-rw-r--r--.  1 root root          0 Dec 17 10:16 SRR1178104_1.fastq.gz
-rw-r--r--.  1 root root 1285544424 Dec 17 10:39 SRR1178104_2.fastq
-rwx------.  1 root root       1307 Dec 17 09:56 STAR_sam_script.txt
root@111:/home# cat SRR1178104_1.fastq |head

The fastq seems ok now.

So the only issue is the pigz not compressing.
I'll try to see if again is a PATH issue as pigz is installed before bioconda or it is just that pigz will not overwrite the SRR1178104_1.fastq.gz file from the interrupted run before.

EDIT
so the above issue was of the previous 0 bytes file and pigz not forcing overwrite over the same name.
I just deleted the fastq.gz + the fastq from the previous run and everything run smoothly.
I don't know if it would be useful to add an option to force rewriting the *.fastq.gz on the pigz call.
Thanks a lot!!!

Glad you were able to get it working!

I'll look into adding progress for downloads, and I agree I think and option to overwrite should be added. Looking at the pigz error, it wasn't very helpful. I'll add a "I found this, please use --force to overwrite" type of message before any downloads start.