nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)

Home Page:https://nf-co.re/hic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bowtie2 Mapping Alignment exceeded running time limit error

koushik20 opened this issue · comments

Description of the bug

Hi,

Thanks for the detailed documentation!
I am running nfcore/hic version 2.0.0 with GRCh38 reference genome but always getting Process exceeded running time limit (16h)

Below is the terminal output

executor >  local (2)
[e4/33766a] process > NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (input_file.csv)   [100%] 1 of 1, cached: 1 ✔
[9f/54196b] process > NFCORE_HIC:HIC:PREPARE_GENOME:CUSTOM_GETCHROMSIZES (genome.fa)  [100%] 1 of 1, cached: 1 ✔
[d3/12afd7] process > NFCORE_HIC:HIC:PREPARE_GENOME:GET_RESTRICTION_FRAGMENTS (^GATC) [100%] 1 of 1, cached: 1 ✔
[c6/5b0dc7] process > NFCORE_HIC:HIC:FASTQC (BT549_Rep2)                              [100%] 2 of 2, cached: 2 ✔
[0d/50c45a] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN (BT549_Rep2) [ 25%] 1 of 4, failed: 1
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:TRIM_READS                 -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN_TRIMMED      -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:MERGE_BOWTIE2              -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:COMBINE_MATES              -
[-        ] process > NFCORE_HIC:HIC:HICPRO:GET_VALID_INTERACTION                     -
[-        ] process > NFCORE_HIC:HIC:HICPRO:MERGE_VALID_INTERACTION                   -
[-        ] process > NFCORE_HIC:HIC:HICPRO:MERGE_STATS                               -
[-        ] process > NFCORE_HIC:HIC:HICPRO:HICPRO2PAIRS                              -
[d5/1dd856] process > NFCORE_HIC:HIC:COOLER:COOLER_MAKEBINS (null})                   [100%] 7 of 7, cached: 7 ✔
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_CLOAD                              -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_BALANCE                            -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_ZOOMIFY                            -
[-        ] process > NFCORE_HIC:HIC:COOLER:COOLER_DUMP                               -
[-        ] process > NFCORE_HIC:HIC:COOLER:SPLIT_COOLER_DUMP                         -
[-        ] process > NFCORE_HIC:HIC:HIC_PLOT_DIST_VS_COUNTS                          -
[-        ] process > NFCORE_HIC:HIC:COMPARTMENTS:COOLTOOLS_EIGSCIS                   -
[-        ] process > NFCORE_HIC:HIC:TADS:COOLTOOLS_INSULATION                        -
[-        ] process > NFCORE_HIC:HIC:CUSTOM_DUMPSOFTWAREVERSIONS                      -
[-        ] process > NFCORE_HIC:HIC:MULTIQC                                          -
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN (BT549_Rep2)'

Caused by:
  Process exceeded running time limit (16h)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\.rev.1.bt2$//"`
  [ -z "$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\.rev.1.bt2l$//"`
  [ -z "$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1
  
  bowtie2 \
      -x $INDEX \
      -U HiChIP_BT549-B_S6_R2_001.fastq.gz \
      --threads 12 \
      --un-gz BT549_Rep2_0_R2.unmapped.fastq.gz \
      --very-sensitive --end-to-end --reorder \
      2> BT549_Rep2_0_R2.bowtie2.log \
      | samtools view -F 4 --threads 12 -o BT549_Rep2_0_R2.bam -
  
  if [ -f BT549_Rep2_0_R2.unmapped.fastq.1.gz ]; then
      mv BT549_Rep2_0_R2.unmapped.fastq.1.gz BT549_Rep2_0_R2.unmapped_1.fastq.gz
  fi
  
  if [ -f BT549_Rep2_0_R2.unmapped.fastq.2.gz ]; then
      mv BT549_Rep2_0_R2.unmapped.fastq.2.gz BT549_Rep2_0_R2.unmapped_2.fastq.gz
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN":
      bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
      pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /mnt/hichip_results/BT549/work/0d/50c45a4cea8db207d2ce122b4f009b

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

The pipeline always stops at this particular bowtie2 mapping step. I gave a separate nextflow.config file and assigned greater memory to this specific step.

process {
  withName: 'NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN' {
    memory = 80.GB
  }
}

So My Questions are
Why does the pipeline aborts at 16h timestamp even though I gave 240h max time?
When I ran some samples earlier with GRCh37 the pipeline was completed successfully so I there an issue with using GRCh38?
I tried to run with different --max_cpus, --max_memory, --max_time configurations but the pipeline always aborts at this particular step (command executed step) see above

Thank you!

Command used and terminal output

Input script filename: run_hicpro.sh

sudo nextflow run nf-core/hic -r 2.0.0 \
       --input '/mnt/hichip_results/BT549/input_file.csv' \
       -profile docker \
       -resume \
       --fastq_chunks_size 20000000 \
       --max_memory '128.GB' \
       --max_time '240.h' \
       --max_cpus 60 \
       --outdir "/mnt/hicpro_results/BT549_Apr2023" \
       --genome GRCh38 \
       --save_pairs_intermediates \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --digestion 'dpnii' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '5000,20000,40000,150000,500000,1000000' \
       --saveReference

Input command: sudo bash run_hicpro.sh

Relevant files

nextflow.log

System information

Nextflow version - 22.10.7
Hardware - Desktop
Executor - local
Container engine: Docker
OS Ubuntu - 20.04.5 Linux
Version - nf-core/hic 2.0.0

@koushik20 I'm having this same issue - were you able to fix it?

I gave a separate custom nextflow config file and the pipeline was completed without any errors.

process {
  withLabel:process_high {
    memory = 64.GB
    cpus = 52
    time = 36.h
  }
}

process {
  withLabel:process_medium {
    memory = 64.GB
    cpus = 52
    time = 36.h
  }
}

process {
  withLabel:process_low {
    memory = 64.GB
    cpus = 52
    time = 36.h
  }
}

process {
  withName:'NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN' {
    memory = 64.GB
    cpus = 52
    time = 36.h
  }
}

process {
  withName:'NFCORE_HIC:HIC:HICPRO:HICPRO_MAPPING:BOWTIE2_ALIGN_TRIMMED' {
    memory = 64.GB
    cpus = 52
    time = 36.h
  }
}

memory = { check_max( 64.GB * task.attempt, 'memory' ) }

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
  if (type == 'memory') {
    try {
      if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
        return params.max_memory as nextflow.util.MemoryUnit
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
      return obj
    }
  } else if (type == 'time') {
    try {
      if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
        return params.max_time as nextflow.util.Duration
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
      return obj
    }
  } else if (type == 'cpus') {
    try {
      return Math.min( obj, params.max_cpus as int )
    } catch (all) {
      println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
      return obj
    }
  }
}

Thank you so much! This worked nicely, though for some samples the bowtie alignment step is taking over 48 hours... seems too long.