Error to launch Canu v2.2 in grid. sqStoreCreate failed; boom!.

Question

Error to launch Canu v2.2 in grid. sqStoreCreate failed; boom!.

ElenaCatelan opened this issue 5 months ago · comments

Hello,
I am using Canu for the first time, I am trying to assemble a fish genome (de novo) in a grid. However it keeps giving me the error rc=256. First I used canu inside a singularity container and now i installed it trough micromamba, in both cases i got the same error. My command is the following, due to time restrictions I am running each step (correction, trimming, assembly) independently:

#!/bin/bash

#SBATCH --job-name bjorn_corr1
#SBATCH --output output_corr.txt 
#SBATCH --error errors_corr.txt 
#SBATCH --time 31-00:00:00 
#SBATCH --ntasks 64 
#SBATCH --partition allgroups 
#SBATCH --mem 1000G 

eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib1/
srun canu -correct -p bjorn_corr1 -d bjorn/ genomeSize=1.5g -nanopore *.fastq.gz \
  -minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
  -stageDirectory=$TMPDIR/calma -gridOptions="--partition allgroups”

My canu.out file is this:

Found perl:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
   This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi

Found java:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
   openjdk version "22.0.1-internal" 2024-04-16

Found canu:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
   canu 2.2

-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--   CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '22.0.1-internal' (from '/stor/progetti/p1038/p1038u02/y/env
s/lzan/lib/jvm/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 8   ' (from 'gnuplot') and image format 'png'.
--
-- Detected 1 CPUs and 256000 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with task IDs up to 1000 allowed.
-- 
-- Slurm support detected.  Resources available:
--      1 host  with 256 cores and 5937 GB memory.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     64.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16.000 GB   64 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   32.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    16.000 GB   16 CPUs  (overlap detection)
-- Grid:  utgovl    16.000 GB   16 CPUs  (overlap detection)
-- Grid:  cor        -.--- GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       32.000 GB    8 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      256.000 GB   16 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- Found Nanopore reads in 'bjorn_corr1.seqStore':
--   Libraries:
--     Nanopore:              67
--   Reads:
--     Raw:                   1883808050
--
--
-- Generating assembly 'bjorn_corr1' in '/stor/progetti/p1038/p1038u02/Elena/Lib1/bjorn':
--   genomeSize:
--     1500000000
  Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1600 ( 16.00%)
--     utgOvlErrorRate 0.1600 ( 16.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1600 ( 16.00%)
--     utgErrorRate    0.1600 ( 16.00%)
--     cnsErrorRate    0.1600 ( 16.00%)
--
--   Stages to run:
--     only correct raw reads.
--
--
-- BEGIN CORRECTION
--
-- OVERLAPPER (mhap) (correction) complete, not rewriting scripts.
--
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/results/000001.ovb FAILED.
--   job correction/1-overlapper/results/000004.ovb FAILED.
--   job correction/1-overlapper/results/000005.ovb FAILED.
--   job correction/1-overlapper/results/000006.ovb FAILED.
--   job correction/1-overlapper/results/000007.ovb FAILED.
--   job correction/1-overlapper/results/000008.ovb FAILED.
--   job correction/1-overlapper/results/000009.ovb FAILED.
--   job correction/1-overlapper/results/000010.ovb FAILED.
--

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

And my bjorn_corr1.seqStore.err

Found perl:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
   This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi

Found java:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
   openjdk version "22.0.1-internal" 2024-04-16

Found canu:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
   canu 2.2

ERROR:  Can't create store './bjorn_corr1.seqStore.BUILDING': store already exists.
          reads               bases
---------- --------- ------ ------------ ------
Loaded          3683  92.1%     23912450  99.4%  /stor/progetti/p1038/p1038u02/Elena/Lib1/FAX15442_pa
ss_90e1428f_93bb83e7_0.fastq.gz
Short            317   7.9%       151060   0.6%

Creating library 'FAX15442_pass_90e1428f_93bb83e7_9' for Nanopore raw reads.

               reads               bases
---------- --------- ------ ------------ ------
Loaded          3881  97.0%     30084483  99.7%  /stor/progetti/p1038/p1038u02/Elena/Lib1/FAX15442_pass_90e1428f_93bb83e7_9.fastq.gz
Short            119   3.0%        79265   0.3%


All reads processed.

               reads               bases
---------- --------- ------ ------------ ------
Loaded        244811  92.2%   1883808050  99.4%
Short          20653   7.8%     11996777   0.6%


Bye.

However in my errors_corr.txt file i get this error:

-- Starting command on Fri May  3 17:54:31 2024 with 13984.637 GB free disk space

    cd .
    ./erebia.seqStore.sh \
    > ./erebia.seqStore.err 2>&1

-- Finished on Fri May  3 17:54:31 2024 (like a bat out of hell) with 13984.637 GB free disk space
----------------------------------------
ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   sqStoreCreate failed; boom!.
ABORT:

Note: When I launch canu locally (my computer has Ubuntu) it runs without problems with the same command and same reads.

I am using a Linux server regulated with the SLURM scheduler (v. 19.05.2), and the sequences are reads produced with Nanopore in fastq.gz files.
Could you please help me point out the error and how to solve it?
Thank you!

Sergey Koren · Answer 1 · Tue May 07 2024 05:04:22 GMT+0800 (China Standard Time)

The fact that it's reporting "Can't create store" in the logs looks like there are multiple runs on the grid colliding in the same folder (like issue #2312). Make sure no canu jobs are running on your SLURM grid, then remove the full folder you specified as the -d option (bjorn) and launch the job just one which will submit itself to the grid and exit. Your initial srun job will return but that is OK, as long as you see canu jobs in the queue, the jobs are still running. You can monitor its progress in the canu.out file.

ElenaCatelan · Answer 2 · Tue May 07 2024 17:07:13 GMT+0800 (China Standard Time)

Thank you for your fast response.
I tried what you suggested and modified my command:

#!/bin/bash

#SBATCH --job-name Lib9
#SBATCH --output output_lib9.txt #qualora volessi scrivere in un txt l'output dei tuoi comandi [facoltativo]
#SBATCH --error errors_lib9.txt #qualora volessi scrivere in un txt gli eventuali errori [facoltativo]
#SBATCH --time 31-00:00:00 #formato del tempo di durata della tua pipeline. [dd-hh:mm:ss]
#SBATCH --ntasks 32 # Numero di task o processi da eseguire contemporaneamente
#SBATCH --partition allgroups # Partizione o coda sulla quale eseguire il job
#SBATCH --mem 256G # Memoria richiesta per il job

eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib9/
srun canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
  -minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
  -stageDirectory=$TMPDIR/calma -useGrid=false

What i don't know is that my error.txt file looks it keeps writing the same thing over and over again, and at some point (almost at the end) it reports again the same error. Attached said file:
errors_lib9.txt

Additionally, my file Lib9_corr.seqStore.err keeps writing the error:

Found perl:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
   This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi

Found java:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
   openjdk version "22.0.1-internal" 2024-04-16

Found canu:
   /stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
   canu 2.2

ERROR:  Can't create store './Lib9_corr.seqStore.BUILDING': store already exists.
            reads               bases
---------- --------- ------ ------------ ------
Loaded          2985  74.6%     17486158  96.4%  /stor/progetti/p1038/p1038u02/Elena/Lib9/FAY37190_pass_3d6493c7_e3a1377b_0.fastq.gz
Short           1015  25.4%       656391   3.6%


All reads processed.

               reads               bases
---------- --------- ------ ------------ ------
Loaded          2985  74.6%     17486158  96.4%
Short           1015  25.4%       656391   3.6%


Bye.

The last thing is I don't have a canu.out file.

However canu runned and finished on the grid, creating the correctedReads.fasta.gz file.
Should I interpret this as a successful run, even with the presence of the error?

Sergey Koren · Answer 3 · Tue May 07 2024 23:21:36 GMT+0800 (China Standard Time)

I wouldn't trust any output you got now because clearly the run is invalid. I think the issue is the srun before canu along with ntasks. You're creating 32 instances of the canu job in parallel which all collide and cause errors. You want 1 task with multiple cores instead. If you want to run canu on the grid, you don't need to submit it at all, just run the canu command on the head node as:

cd Elena/Lib9/
canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
  -minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 gridOptions="--partition allgroups"

Canu will do some basic sanity checks and will launch the jobs to the grid for you, it will also request appropriate resources. If you still want to run a single job, you can use the same command just without srun/ntasks:

#!/bin/bash

#SBATCH --job-name Lib9
#SBATCH --output output_lib9.txt #qualora volessi scrivere in un txt l'output dei tuoi comandi [facoltativo]
#SBATCH --error errors_lib9.txt #qualora volessi scrivere in un txt gli eventuali errori [facoltativo]
#SBATCH --time 31-00:00:00 #formato del tempo di durata della tua pipeline. [dd-hh:mm:ss]
#SBATCH --cpus-per-task 32
#SBATCH --partition allgroups # Partizione o coda sulla quale eseguire il job
#SBATCH --mem 256G # Memoria richiesta per il job

eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib9/
canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
  -minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
  -useGrid=false

and submit the above to the grid with sbatch.

ElenaCatelan · Answer 4 · Wed May 08 2024 16:39:43 GMT+0800 (China Standard Time)

Thank you so much for your help, I launched the last command you suggested and it runned smoothly without any errors!