Preprocess annotate_raw_with_fastqs halting issue

Question

Preprocess annotate_raw_with_fastqs halting issue

SwiftSeal opened this issue 2 years ago · comments

Hiya!

I'm finding that tombo preprocess halts around halfway through any run. The first half runs relatively quick (predicted 2 hour run time) but this appears to slow down exponentially until it halts completely.

The command I am running is:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5s_single/ --fastq-filenames fast5s_guppy.fastq --sequencing-summary-filenames fast5s_guppy/sequencing_summary.txt --processes 32 --basecall-group Basecall_1D_000 --basecall-subgroup BaseCalled_template --overwrite

Here is a run after approx 18 hours:

[14:34:36] Getting read filenames.
[14:36:17] Parsing sequencing summary files.
[14:36:35] Annotating FAST5s with sequence from FASTQs.
 35%|████████████████████████████████████████████▌                                                                                   | 2644500/7596528 [2:39:39<4:58:58, 276.05it/s]

This is the output if I terminate the process:

^CTraceback (most recent call last):
Process Process-34:
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/bin/tombo", line 11, in <module>
Process Process-3:
Process Process-22:
Process Process-31:
Process Process-24:
Process Process-15:
Process Process-17:
Process Process-21:
Process Process-29:
Process Process-30:
Process Process-27:
Process Process-9:
Process Process-26:
Process Process-5:
Process Process-23:
Process Process-20:
Process Process-25:
Process Process-11:
Process Process-33:
Process Process-32:
Process Process-13:
Process Process-8:
Process Process-7:
Process Process-2:
Process Process-19:
Process Process-6:
Process Process-18:
Process Process-4:
    sys.exit(main())
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/__main__.py", line 235, in main
    _preprocess.annotate_reads_with_fastq_main(args)
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/_preprocess.py", line 526, in annotate_reads_with_fastq_main
    args.overwrite)
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/site-packages/tombo/_preprocess.py", line 283, in _annotate_with_fastqs
    fq_feed_p.join()
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/mnt/shared/scratch/msmith/apps/conda/envs/tombo/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)

What I've tried

This issue repeats itself on the pip and conda installations of 1.5.1. I've tried reinstalling but no success.
This issue doesn't repeat itself on a smaller run (using dataset here: https://github.com/PengNi/deepsignal-plant).
I've checked that sequencing_summary.txt has the same number of reads as there are fast5s in the directory.

Other info

htop shows that a multithreaded run eventually turns into a single process using ~20% CPU.
These reads are from a size selected ONT run, N50 is approx 30kbp with some reads >100kbp.
This is running on a Rocky Linux system with a SLURM manager. Issue occurs with interactive and batch jobs.

Any support would be greatly appreciated, I've checked through the previous issues but couldn't find a solution!

Thanks in advance :)