Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MemoryError

vanbie opened this issue · comments

Describe the bug
An error occured when I was trying to assemble corrected data. Wonder if it was an issue about parameter setting.

Error message
hostname

  • hostname
    cd /root/nd/NextDenovo/Mo/Mo_out/02.cns_align/01.split_seed.sh.work/split_seed0
  • cd /root/nd/NextDenovo/Mo/Mo_out/02.cns_align/01.split_seed.sh.work/split_seed0
    time /usr/bin/python3 /root/nd/NextDenovo/lib/split_cns.py -f /root/nd/NextDenovo/Mo/input.fofn -l 37491 -c 6
  • time /usr/bin/python3 /root/nd/NextDenovo/lib/split_cns.py -f /root/nd/NextDenovo/Mo/input.fofn -l 37491 -c 6
    [INFO] 2021-08-11 22:39:29,574 Split step options:
    [INFO] 2021-08-11 22:39:29,574 Namespace(count=6, fofn='/root/nd/NextDenovo/Mo/input.fofn', index=True, min_len=37491, outdir='./', rename=True)
    Traceback (most recent call last):
    File "/root/nd/NextDenovo/lib/split_cns.py", line 155, in
    main(args)
    File "/root/nd/NextDenovo/lib/split_cns.py", line 129, in main
    f.cutf(args.count, rn = args.rename, ml = args.min_len, pdir = args.outdir, index = args.index)
    File "/root/nd/NextDenovo/lib/split_cns.py", line 108, in cutf
    print('>%d %d %f pid=%s\n%s' % (t, lens, 1, name, seq), file=fa_files[i])
    MemoryError
    Command exited with non-zero status 1
    90.50user 132.68system 4:47.14elapsed 77%CPU (0avgtext+0avgdata 245949344maxresident)k
    48378432inputs+8outputs (119major+61515086minor)pagefaults 0swaps

Genome characteristics
genome size=490m heterozygous rate=1.3% repeat content=58%

Input data
Total base count=62880679007bp sequencing depth=129, average/N50 read length=30172

Config file
[General]
job_type = local
job_prefix = nextDenovo
task = assemble # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 8
input_type = corrected
read_type = ont
input_fofn = ./input.fofn
workdir = M_out

genome_size = 485m

[assemble_option]
minimap2_options_cns = -t 4
nextgraph_options = -a 1

Operating system
Ubuntu 18.04 64bit

GCC
gcc version 7.5.0

Python
Python 3.6.9

NextDenovo
nextDenovo v2.4.0

To Reproduce (Optional)
none

Additional context (Optional)
32core 256G server

It seems you have too much data, I think you can run NextDenovo with raw data (uncorrected data), which may run faster. Regarding the error you mentioned, actually, I do not know why the print expression causes MemoryError, I need more time to figure it out.

It seems you have too much data, I think you can run NextDenovo with raw data (uncorrected data), which may run faster. Regarding the error you mentioned, actually, I do not know why the print expression causes MemoryError, I need more time to figure it out.

Thanks. Because the data was released in corrected reads, so I can only download the clean data. The original report used NextDenovo for analyzing as well, but did not mentioned too much details.