nf-core / mag

Assembly and binning of metagenomes

Home Page:https://nf-co.re/mag

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUSCO summary fails

mtva0001 opened this issue · comments

Description of the bug

I'm running the pipeline on our server, using 30 cores and 250 Gb memory (specified in my custom config file). The BUSCO summary step stops immediately when the job is submitted. It suggests a memory issue and seems reasonable given that I have 267 samples in total.
Do you know how to fix such issue?

Command used and terminal output

nextflow run nf-core/mag -r 2.3.1 -profile conda --input Samplesheet_input.csv --outdir result --gtdb 'https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/auxillary_files/gtdbtk_r214_data.tar.gz' --skip_spades --skip_spadeshybrid --skip_maxbin2 --skip_concoct -resume jolly_hypatia --max_time 50.h --gtdbtk_min_completeness 40.0 --busco_clean -c config_250GB.nf

The exit status of the task that caused the workflow execution to fail was: 139.

The full error message was:

Error executing process > 'NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY'

Caused by:
  Process `NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY` terminated with an error exit status (139)

Command executed:

  summary_busco.py -a -ss short_summary.specific_lineage.chlorobi_odb10.MEGAHIT-MetaBAT2-MJ110629B.13.fa.txt [here I deleted the list of files because it was way too long to make this comment] -o busco_summary.tsv
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY":
      python: $(python --version 2>&1 | sed 's/Python //g')
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
  END_VERSIONS

Command exit status:
  139

Command output:
  (empty)

Relevant files

No response

System information

nextflow version 23.04.1.5866
Hardware: server
local execution
Conda container
Version of nf-core: mag -r 2.3.1

I suspect you are right regarding resource requirements.

To configure the pipeline to adapt the pipelines defaults you your machine (s) please see the documentation here:

https://nf-co.re/docs/usage/configuration#tuning-workflow-resources

Thanks! But even with 300 Gb memory it doesn't work. And this is the maximum I could allocate. Would it maybe possible to create the summary .tsv file in chunks? Or something like having -sd, -ss and -f created separately?

While it is possible to create summary files in chunks, I don't think it would be a great idea to create multiple summary files that would then have to be summarized again.

I think it would be best to either add errorStrategy 'ignore' to the BUSCO_SUMMARY process in a custom configuration file and then manually summarize them or parse your samples into chunks before running them through the pipeline.

Thanks! I resumed with the modified config file (specifying the suggested errorStrategy), and the pipeline doesn't continue after error is ignored. How can I make it continue?

[b4/a756df] NOTE: Process NFCORE_MAG:MAG:BUSCO_QC:BUSCO_SUMMARY terminated with an error exit status (139) -- Error is ignored

ERROR ~ Stream Closed

I apologize, I should have said that the errorStrategy = 'ignore' should be used on the entire pipeline. This means that if sample 1 fails at a process but 2-267 pass, only samples 2-267 will move on to the next process. You can also look into the "retry" errorStrategy so that certain processes can be retried if they fail.

With the BUSCO_SUMMARY, this means that any process that requires it for input will not be ran as it will throw an exit status.

Lastly, I am going to close this issue as it pertains to configuring the pipeline on your system. If you need more assistance, please join the slack channel and ask in the #mag or #configs channel.