Error to launch Canu v2.2 in grid. sqStoreCreate failed; boom!.
ElenaCatelan opened this issue · comments
Hello,
I am using Canu for the first time, I am trying to assemble a fish genome (de novo) in a grid. However it keeps giving me the error rc=256. First I used canu inside a singularity container and now i installed it trough micromamba, in both cases i got the same error. My command is the following, due to time restrictions I am running each step (correction, trimming, assembly) independently:
#!/bin/bash
#SBATCH --job-name bjorn_corr1
#SBATCH --output output_corr.txt
#SBATCH --error errors_corr.txt
#SBATCH --time 31-00:00:00
#SBATCH --ntasks 64
#SBATCH --partition allgroups
#SBATCH --mem 1000G
eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib1/
srun canu -correct -p bjorn_corr1 -d bjorn/ genomeSize=1.5g -nanopore *.fastq.gz \
-minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
-stageDirectory=$TMPDIR/calma -gridOptions="--partition allgroups”
My canu.out file is this:
Found perl:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi
Found java:
/stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
openjdk version "22.0.1-internal" 2024-04-16
Found canu:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
canu 2.2
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '22.0.1-internal' (from '/stor/progetti/p1038/p1038u02/y/env
s/lzan/lib/jvm/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 8 ' (from 'gnuplot') and image format 'png'.
--
-- Detected 1 CPUs and 256000 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with task IDs up to 1000 allowed.
--
-- Slurm support detected. Resources available:
-- 1 host with 256 cores and 5937 GB memory.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ---------- -------- -----------------------------
-- Grid: meryl 64.000 GB 8 CPUs (k-mer counting)
-- Grid: hap 16.000 GB 64 CPUs (read-to-haplotype assignment)
-- Grid: cormhap 32.000 GB 16 CPUs (overlap detection with mhap)
-- Grid: obtovl 16.000 GB 16 CPUs (overlap detection)
-- Grid: utgovl 16.000 GB 16 CPUs (overlap detection)
-- Grid: cor -.--- GB 4 CPUs (read correction)
-- Grid: ovb 4.000 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 32.000 GB 1 CPU (overlap store sorting)
-- Grid: red 32.000 GB 8 CPUs (read error detection)
-- Grid: oea 8.000 GB 1 CPU (overlap error adjustment)
-- Grid: bat 256.000 GB 16 CPUs (contig construction with bogart)
-- Grid: cns -.--- GB 8 CPUs (consensus)
--
-- Found Nanopore reads in 'bjorn_corr1.seqStore':
-- Libraries:
-- Nanopore: 67
-- Reads:
-- Raw: 1883808050
--
--
-- Generating assembly 'bjorn_corr1' in '/stor/progetti/p1038/p1038u02/Elena/Lib1/bjorn':
-- genomeSize:
-- 1500000000
Overlap Generation Limits:
-- corOvlErrorRate 0.3200 ( 32.00%)
-- obtOvlErrorRate 0.1600 ( 16.00%)
-- utgOvlErrorRate 0.1600 ( 16.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1600 ( 16.00%)
-- utgErrorRate 0.1600 ( 16.00%)
-- cnsErrorRate 0.1600 ( 16.00%)
--
-- Stages to run:
-- only correct raw reads.
--
--
-- BEGIN CORRECTION
--
-- OVERLAPPER (mhap) (correction) complete, not rewriting scripts.
--
--
-- Mhap overlap jobs failed, tried 2 times, giving up.
-- job correction/1-overlapper/results/000001.ovb FAILED.
-- job correction/1-overlapper/results/000004.ovb FAILED.
-- job correction/1-overlapper/results/000005.ovb FAILED.
-- job correction/1-overlapper/results/000006.ovb FAILED.
-- job correction/1-overlapper/results/000007.ovb FAILED.
-- job correction/1-overlapper/results/000008.ovb FAILED.
-- job correction/1-overlapper/results/000009.ovb FAILED.
-- job correction/1-overlapper/results/000010.ovb FAILED.
--
ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
And my bjorn_corr1.seqStore.err
Found perl:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi
Found java:
/stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
openjdk version "22.0.1-internal" 2024-04-16
Found canu:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
canu 2.2
ERROR: Can't create store './bjorn_corr1.seqStore.BUILDING': store already exists.
reads bases
---------- --------- ------ ------------ ------
Loaded 3683 92.1% 23912450 99.4% /stor/progetti/p1038/p1038u02/Elena/Lib1/FAX15442_pa
ss_90e1428f_93bb83e7_0.fastq.gz
Short 317 7.9% 151060 0.6%
Creating library 'FAX15442_pass_90e1428f_93bb83e7_9' for Nanopore raw reads.
reads bases
---------- --------- ------ ------------ ------
Loaded 3881 97.0% 30084483 99.7% /stor/progetti/p1038/p1038u02/Elena/Lib1/FAX15442_pass_90e1428f_93bb83e7_9.fastq.gz
Short 119 3.0% 79265 0.3%
All reads processed.
reads bases
---------- --------- ------ ------------ ------
Loaded 244811 92.2% 1883808050 99.4%
Short 20653 7.8% 11996777 0.6%
Bye.
However in my errors_corr.txt file i get this error:
-- Starting command on Fri May 3 17:54:31 2024 with 13984.637 GB free disk space
cd .
./erebia.seqStore.sh \
> ./erebia.seqStore.err 2>&1
-- Finished on Fri May 3 17:54:31 2024 (like a bat out of hell) with 13984.637 GB free disk space
----------------------------------------
ERROR:
ERROR: Failed with exit code 1. (rc=256)
ERROR:
ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: sqStoreCreate failed; boom!.
ABORT:
Note: When I launch canu locally (my computer has Ubuntu) it runs without problems with the same command and same reads.
I am using a Linux server regulated with the SLURM scheduler (v. 19.05.2), and the sequences are reads produced with Nanopore in fastq.gz files.
Could you please help me point out the error and how to solve it?
Thank you!
The fact that it's reporting "Can't create store" in the logs looks like there are multiple runs on the grid colliding in the same folder (like issue #2312). Make sure no canu jobs are running on your SLURM grid, then remove the full folder you specified as the -d option (bjorn
) and launch the job just one which will submit itself to the grid and exit. Your initial srun job will return but that is OK, as long as you see canu jobs in the queue, the jobs are still running. You can monitor its progress in the canu.out
file.
Thank you for your fast response.
I tried what you suggested and modified my command:
#!/bin/bash
#SBATCH --job-name Lib9
#SBATCH --output output_lib9.txt #qualora volessi scrivere in un txt l'output dei tuoi comandi [facoltativo]
#SBATCH --error errors_lib9.txt #qualora volessi scrivere in un txt gli eventuali errori [facoltativo]
#SBATCH --time 31-00:00:00 #formato del tempo di durata della tua pipeline. [dd-hh:mm:ss]
#SBATCH --ntasks 32 # Numero di task o processi da eseguire contemporaneamente
#SBATCH --partition allgroups # Partizione o coda sulla quale eseguire il job
#SBATCH --mem 256G # Memoria richiesta per il job
eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib9/
srun canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
-minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
-stageDirectory=$TMPDIR/calma -useGrid=false
What i don't know is that my error.txt file looks it keeps writing the same thing over and over again, and at some point (almost at the end) it reports again the same error. Attached said file:
errors_lib9.txt
Additionally, my file Lib9_corr.seqStore.err keeps writing the error:
Found perl:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/perl
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi
Found java:
/stor/progetti/p1038/p1038u02/y/envs/lzan/lib/jvm/bin/java
openjdk version "22.0.1-internal" 2024-04-16
Found canu:
/stor/progetti/p1038/p1038u02/y/envs/lzan/bin/canu
canu 2.2
ERROR: Can't create store './Lib9_corr.seqStore.BUILDING': store already exists.
reads bases
---------- --------- ------ ------------ ------
Loaded 2985 74.6% 17486158 96.4% /stor/progetti/p1038/p1038u02/Elena/Lib9/FAY37190_pass_3d6493c7_e3a1377b_0.fastq.gz
Short 1015 25.4% 656391 3.6%
All reads processed.
reads bases
---------- --------- ------ ------------ ------
Loaded 2985 74.6% 17486158 96.4%
Short 1015 25.4% 656391 3.6%
Bye.
The last thing is I don't have a canu.out file.
However canu runned and finished on the grid, creating the correctedReads.fasta.gz file.
Should I interpret this as a successful run, even with the presence of the error?
I wouldn't trust any output you got now because clearly the run is invalid. I think the issue is the srun before canu along with ntasks. You're creating 32 instances of the canu job in parallel which all collide and cause errors. You want 1 task with multiple cores instead. If you want to run canu on the grid, you don't need to submit it at all, just run the canu command on the head node as:
cd Elena/Lib9/
canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
-minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 gridOptions="--partition allgroups"
Canu will do some basic sanity checks and will launch the jobs to the grid for you, it will also request appropriate resources. If you still want to run a single job, you can use the same command just without srun/ntasks:
#!/bin/bash
#SBATCH --job-name Lib9
#SBATCH --output output_lib9.txt #qualora volessi scrivere in un txt l'output dei tuoi comandi [facoltativo]
#SBATCH --error errors_lib9.txt #qualora volessi scrivere in un txt gli eventuali errori [facoltativo]
#SBATCH --time 31-00:00:00 #formato del tempo di durata della tua pipeline. [dd-hh:mm:ss]
#SBATCH --cpus-per-task 32
#SBATCH --partition allgroups # Partizione o coda sulla quale eseguire il job
#SBATCH --mem 256G # Memoria richiesta per il job
eval "$(/stor/progetti/p1038/p1038u02/y/micromamba shell hook -s posix)"
micromamba activate lzan
cd Elena/Lib9/
canu -correct -p Lib9_corr -d Lib9_corr/ genomeSize=100m -nanopore *.fastq.gz \
-minInputCoverage=1 -stopOnLowCoverage=1 -correctedErrorRate=0.16 \
-useGrid=false
and submit the above to the grid with sbatch.
Thank you so much for your help, I launched the last command you suggested and it runned smoothly without any errors!