nf-core / mag

Assembly and binning of metagenomes

Home Page:https://nf-co.re/mag

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CHECKM_LINEAGEWF failing with exit status 1

carleton-envbiotech opened this issue · comments

Description of the bug

When running nf-core/mag v2.5.4, I have run into an issue when including the --binqc_tool checkm flag that returns an error with an exit status of 1 and further indicates there is an unexpected error <class 'EOFError'>. I have provided the input code below in addition to an excerpt of the output. I can solve this issue when removing the flag and continuing with BUSCO instead, so it seems to be specific to the CheckM step.

Command used and terminal output

nextflow run nf-core/mag -r 2.5.4 -c mag-memory-increase.conf -profile apptainer  \
--input '/datastore/researchdata/sequencing_data_archive/nitrifying_consortia_illumina_short_read_data/NitrifyingPelletDNA*_R{1,2}*.fastq.gz' \
--outdir  Nitrifying_consortia_analyses \
--refine_bins_dastool \
--binqc_tool checkm \
--postbinning_input refined_bins_only \
--skip_spades \
--skip_concoct \
--gtdb_db "/datastore/researchdata/gtdbtk/gtdbtk_data.tar.gz" \
--busco_db "busco_nextflow/bacteria_odb10.2024-01-08.tar.gz"

#Terminal output
MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:CHECKM_QC:CHECKM_LINEAGEWF":
      checkm: $( checkm 2>&1 | grep '...:::' | sed 's/.*CheckM v//;s/ .*//' )
  END_VERSIONS

Command exit status:
  1

Command output:
  [2024-02-22 09:26:07] INFO: CheckM v1.2.1
  [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf
  [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16
  [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree.
  [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:
  
  Unexpected error: <class 'EOFError'>

Command error:
  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-55r4atjf because the default path (/files/home/dgregoire/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  [2024-02-22 09:26:07] INFO: CheckM v1.2.1
  [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf
  [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16
  [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree.
  [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:
  Process SyncManager-1:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
      self.run()
    File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
      self._target(*self._args, **self._kwargs)
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 591, in _run_server
      server = cls._Server(registry, address, authkey, serializer)
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 156, in __init__
      self.listener = Listener(address=address, backlog=16)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 453, in __init__
      self._listener = SocketListener(address, family, backlog)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 596, in __init__
      self._socket.bind(address)
  OSError: [Errno 98] Address already in use
  Traceback (most recent call last):
    File "/usr/local/bin/checkm", line 856, in <module>
  
  Unexpected error: <class 'EOFError'>
      checkmParser.parseOptions(args)
    File "/usr/local/lib/python3.10/site-packages/checkm/main.py", line 979, in parseOptions
      self.tree(options)
    File "/usr/local/lib/python3.10/site-packages/checkm/main.py", line 157, in tree
      binIdToModels = mgf.find(binFiles,
    File "/usr/local/lib/python3.10/site-packages/checkm/markerGeneFinder.py", line 67, in find
      binIdToModels = mp.Manager().dict()
    File "/usr/local/lib/python3.10/multiprocessing/context.py", line 57, in Manager
      m.start()
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 566, in start
      self._address = reader.recv()
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 255, in recv
      buf = self._recv_bytes()
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
      buf = self._recv(4)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
      raise EOFError
  EOFError

Work dir:
  /datastore/userdata/daniel/work/7f/22e0bcdd238749fc9a385e696c1bcf

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Running on HPC with 1.0 Tb RAM and 48 CPU
Executed locally
Container engineer Apptainer
CentOS Linux
nf-core/mag v 2.5.4

Thanks for the report!

EOFerror implies to me that there is an empty input file or corrupted database somewhere...

If you go into the reported work directory, can you inspect the input files to see if they do have something in them?

Working through the work directory, I see the following:

  • A directory called checkm_data_2015_01_16 that has multiple subdirectory within it
  • An input_bins directory that contains a single .fa file, as would be expected
  • An additional directory called: MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf after the sample name that initiated this process.
  • Notably, the 'bin' subdirectory within the above-mentioned directory is empty, as is the 'storage' directory.
  • Using 'cat' to open the checkm.log only returns this, which looks like where the process encountered an error:

[2024-02-22 09:26:07] INFO: CheckM v1.2.1 [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16 [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree. [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:

Looking at the checkm issues, I think it maybe you have run out of memory for the checkm process.

You shoulf increase the memory for that errored process in your custom config file too, as you've already done for others it seems

I obtained this error message even after adjusting the configuration to look like the following excerpt:

process { withName: GTDBTK_CLASSIFYWF { cpus = 32 memory = 256.GB } withName: CHECKM_QC { cpus = 32 memory = 256.GB } }

Gah. Could you try running the command manually (.command.sh) with a local copy of checkM? That way we can isolate the error whether it's the pipeline doing something wrong or thetool...

@carleton-envbiotech my feeling is either still memory, this seems to be REALLY common issue with checkm, and results in very similar errors.

I note that your configuration in teh except woudn't work without new lines - was that just a quick type out?

process { 
        withName: GTDBTK_CLASSIFYWF { 
                cpus = 32
                memory = 256.GB 
        } 
        withName: CHECKM_QC { 
                cpus = 32 
                memory = 256.GB 
        }
}

Works for me for example

Otherwise, maybe it's the wrong database file being passed to it... the nf-core/mag docs for --checkm_db says it shoujld be this below, but looks like you have a different name in the command above (it might be the same contents, IDK)

default: https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/auxillary_files/gtdbtk_r214_data.tar.gz```

Going to close for now, as I think ti's a memory issue rather than a pipeline error