nf-core / mag

Assembly and binning of metagenomes

Home Page:https://nf-co.re/mag

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bins in CheckM summary do not match bins in bin depths summary

QingDAI0225 opened this issue · comments

Description of the bug

Nextflow workflow report

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_MAG:MAG:BIN_SUMMARY (1)'

Caused by:
Process NFCORE_MAG:MAG:BIN_SUMMARY (1) terminated with an error exit status (1)

Command executed:

combine_tables.py --depths_summary bin_depths_summary.tsv --checkm_summary checkm_summary.tsv --quast_summary quast_summary.tsv --gtdbtk_summary gtdbtk_summary.tsv --out bin_summary.tsv

cat <<-END_VERSIONS > versions.yml
"NFCORE_MAG:MAG:BIN_SUMMARY":
python: $(python --version 2>&1 | sed 's/Python //g')
pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Bins in CheckM summary do not match bins in bin depths summary!

Command used and terminal output

command used:

nextflow run nf-core/mag -c /work/qd33/nanopore/QD_ptrap_20230908/nextflow.config -profile singularity --input /work/qd33/nanopore/QD_ptrap_20230908/nf_core_mag/samplesheet_102K_2.csv --outdir /work/qd33/nanopore/QD_ptrap_20230908/nf_core_mag/result_102K_2 --skip_spadeshybrid true --skip_concoct true --run_virus_identification true --gtdb_db /work/qd33/nanopore/QD_ptrap_20230908/Mgnify_db/gtdbtk_r214_data.tar.gz --bin_domain_classification true --skip_prokka false --kraken2_db /work/qd33/nanopore/QD_ptrap_20230908/Mgnify_db/k2_pluspfp_20231009.tar.gz --skip_prokka false --skip_metaeuk false --skip_krona false --skip_gtdbtk false --binqc_tool checkm --save_checkm_data true

terminal output:

executor >  slurm (403)
[1b/8f47be] process > NFCORE_MAG:MAG:ARIA2_UNTAR ... [100%] 1 of 1 ✔
[13/d9b9cf] process > NFCORE_MAG:MAG:FASTQC_RAW (... [100%] 1 of 1 ✔
[f3/0bcc19] process > NFCORE_MAG:MAG:FASTP (102k_... [100%] 1 of 1 ✔
[67/a93324] process > NFCORE_MAG:MAG:BOWTIE2_PHIX... [100%] 1 of 1 ✔
[2f/f617c4] process > NFCORE_MAG:MAG:BOWTIE2_PHIX... [100%] 1 of 1 ✔
[74/98b15c] process > NFCORE_MAG:MAG:FASTQC_TRIMM... [100%] 1 of 1 ✔
[-        ] process > NFCORE_MAG:MAG:CAT_FASTQ       -
[-        ] process > NFCORE_MAG:MAG:NANOPLOT_RAW    -
[-        ] process > NFCORE_MAG:MAG:PORECHOP        -
[-        ] process > NFCORE_MAG:MAG:NANOLYSE        -
[-        ] process > NFCORE_MAG:MAG:FILTLONG        -
[-        ] process > NFCORE_MAG:MAG:NANOPLOT_FIL... -
[-        ] process > NFCORE_MAG:MAG:CENTRIFUGE      -
[84/5ed55d] process > NFCORE_MAG:MAG:KRAKEN2_DB_P... [100%] 1 of 1 ✔
[ba/88c596] process > NFCORE_MAG:MAG:KRAKEN2 (102... [ 66%] 2 of 3, failed: 2...
[88/4b72c6] process > NFCORE_MAG:MAG:KRONA_DB        [100%] 1 of 1 ✔
[-        ] process > NFCORE_MAG:MAG:KRONA           -
[81/be339b] process > NFCORE_MAG:MAG:MEGAHIT (102... [100%] 1 of 1 ✔
[7a/cfaf8e] process > NFCORE_MAG:MAG:SPADES (102k... [100%] 1 of 1 ✔
[e6/12e4c9] process > NFCORE_MAG:MAG:QUAST (SPAde... [100%] 2 of 2 ✔
[58/06ba0b] process > NFCORE_MAG:MAG:PRODIGAL (10... [100%] 2 of 2 ✔
[9c/740921] process > NFCORE_MAG:MAG:VIRUS_IDENTI... [100%] 1 of 1 ✔
[57/3e95e5] process > NFCORE_MAG:MAG:VIRUS_IDENTI... [100%] 2 of 2 ✔
[f5/ea5a04] process > NFCORE_MAG:MAG:BINNING_PREP... [100%] 2 of 2 ✔
[9a/f0b473] process > NFCORE_MAG:MAG:BINNING_PREP... [100%] 2 of 2 ✔
[20/499dcd] process > NFCORE_MAG:MAG:BINNING:META... [100%] 2 of 2 ✔
[ef/7ecf86] process > NFCORE_MAG:MAG:BINNING:CONV... [100%] 2 of 2 ✔
[73/28a4a6] process > NFCORE_MAG:MAG:BINNING:META... [100%] 2 of 2 ✔
[9f/397968] process > NFCORE_MAG:MAG:BINNING:MAXB... [100%] 2 of 2 ✔
[96/a9bc84] process > NFCORE_MAG:MAG:BINNING:ADJU... [100%] 2 of 2 ✔
[09/54440a] process > NFCORE_MAG:MAG:BINNING:SPLI... [100%] 4 of 4 ✔
[e8/ec9f8b] process > NFCORE_MAG:MAG:BINNING:GUNZ... [100%] 166 of 166 ✔
[-        ] process > NFCORE_MAG:MAG:BINNING:GUNZ... -
[e6/cbd4a0] process > NFCORE_MAG:MAG:DOMAIN_CLASS... [100%] 2 of 2 ✔
[20/d05229] process > NFCORE_MAG:MAG:DOMAIN_CLASS... [100%] 4 of 4 ✔
[c6/c64c45] process > NFCORE_MAG:MAG:DOMAIN_CLASS... [100%] 4 of 4 ✔
[4c/1b7d2b] process > NFCORE_MAG:MAG:DOMAIN_CLASS... [100%] 1 of 1 ✔
[4d/0bc6e4] process > NFCORE_MAG:MAG:DEPTHS:MAG_D... [100%] 4 of 4 ✔
[-        ] process > NFCORE_MAG:MAG:DEPTHS:MAG_D... -
[fc/1cc9ce] process > NFCORE_MAG:MAG:DEPTHS:MAG_D... [100%] 1 of 1 ✔
[b7/d353e4] process > NFCORE_MAG:MAG:CHECKM_QC:CH... [100%] 4 of 4 ✔
[6d/642c0e] process > NFCORE_MAG:MAG:CHECKM_QC:CH... [100%] 4 of 4 ✔
[7b/29c475] process > NFCORE_MAG:MAG:CHECKM_QC:CO... [100%] 1 of 1 ✔
[74/9c07c8] process > NFCORE_MAG:MAG:QUAST_BINS (... [100%] 7 of 7 ✔
[17/9bb268] process > NFCORE_MAG:MAG:QUAST_BINS_S... [100%] 1 of 1 ✔
[-        ] process > NFCORE_MAG:MAG:CAT             -
[-        ] process > NFCORE_MAG:MAG:CAT_SUMMARY     -
[96/f9f8cd] process > NFCORE_MAG:MAG:GTDBTK:GTDBT... [100%] 1 of 1 ✔
[b6/07b82f] process > NFCORE_MAG:MAG:GTDBTK:GTDBT... [100%] 4 of 4 ✔
[ec/dd8b76] process > NFCORE_MAG:MAG:GTDBTK:GTDBT... [100%] 1 of 1 ✔
[d6/ea361f] process > NFCORE_MAG:MAG:BIN_SUMMARY (1) [100%] 1 of 1, failed: 1 ✘
[ec/fff8c2] process > NFCORE_MAG:MAG:PROKKA (SPAd... [100%] 160 of 160 ✔
[-        ] process > NFCORE_MAG:MAG:CUSTOM_DUMPS... -
[-        ] process > NFCORE_MAG:MAG:MULTIQC         -
-[nf-core/mag] Pipeline completed with errors-
[ba/88c596] NOTE: Process `NFCORE_MAG:MAG:KRAKEN2 (102k_P_1_Pool_2-k2_pluspfp_20231009)` terminated with an error exit status (140) -- Execution is retried (2)
ERROR ~ Error executing process > 'NFCORE_MAG:MAG:BIN_SUMMARY (1)'

Caused by:
  Process `NFCORE_MAG:MAG:BIN_SUMMARY (1)` terminated with an error exit status (1)

Command executed:

  combine_tables.py --depths_summary bin_depths_summary.tsv                                          --checkm_summary checkm_summary.tsv                     --quast_summary quast_summary.tsv                     --gtdbtk_summary gtdbtk_summary.tsv                                          --out bin_summary.tsv
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:BIN_SUMMARY":
      python: $(python --version 2>&1 | sed 's/Python //g')
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Bins in CheckM summary do not match bins in bin depths summary!

Work dir:
  /work/qd33/nanopore/QD_ptrap_20230908/nf_working/d6/ea361f0f6ef1ab87281a9f42cd2715

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Nextflow version
version 23.04.3, build 5875 (11-08-2023 18:37 UTC)
Hardware
HPC
Executor
slurm
Container engine:
Singularity
Version of nf-core/mag 2.5.4

Hi @QingDAI0225 do you mind sharing with me the files within the /work/qd33/nanopore/QD_ptrap_20230908/nf_working/d6/ea361f0f6ef1ab87281a9f42cd2715 directory? You can send them to me privately if you prefer to jfy133@gmail.com (will be kept confidential) - I need to understand what is in the various tables

Hi @QingDAI0225 do you mind sharing with me the files within the /work/qd33/nanopore/QD_ptrap_20230908/nf_working/d6/ea361f0f6ef1ab87281a9f42cd2715 directory? You can send them to me privately if you prefer to jfy133@gmail.com (will be kept confidential) - I need to understand what is in the various tables

Already sent you through email. Thank you so much.

OK I think I've identified the problem: the bin IDs in the checkm_summary file do not have the file extension, while the others do.

I'm not sure why this is at the moment, but I'm currently at a hackathon working on something else - I will try to come back to this next week.

Trying locally, actually this does not seem to be an issue when running iwth BUSCO at least.

@QingDAI0225 we also inspected with @maxibor and we discussed why you may be missing the 6 CheckM bins. The hypothesis is that maybe some of your CHECKM jobs failed (e.g, no marker genes could be found), and thus were not exported.

IF this is valid behaviour (assuming CheckM did just not find anything for those bins, rather than CheckM failing for some other reason), we will update the combine_tables.py script.

Could you please send me your .nextflow.log (again via email if you prefer). The file will be wherever you ran the nextflow run command from.

And also if you can send the work/ of the CheckM process of one of the bins not in the table, that would also be helpful

So we strongly suspect this is it, and can be confirmed by sending teh work/ of the CheckM process of your 'missing bins'.

Technical details: the CHECKM module has all outputs set to 'optional: true', meaning it will not fail if there is no output file found. We think this happens on purpose: we suspect checkm itself will not fail with an error if it finds nothing, but it will just report in console 'nothing found' and produce no output file. In nextflow terms, if no files are emitted from that process that is also fine that sample 'stops' to conitnue through that subworkflow. In this specific case then, only the output of CHECKM processes that did get emitted will be combined into the table for merging.

If we confirm this is the case, then we just need to update the combine_tables.py