Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about how NextDenovo correct raw ONT reads

Jesson-mark opened this issue · comments

Question or Expected behavior
Hi, Dr. Hu, thanks for your great works for NextDenovo. I have a few questions about the working mechanism of NextDenovo, especially the correction step. I would appreciate it very much if you could help me.

Since I was using NextDenovo to correct raw ONT reads recently, I noticed that for data that have different mean depth, the mean depth difference between uncorrected ONT data and corrected ONT data is not consistent. For example, I have four data whose mean depth are 10x, 13x, 20x and 146x, respectively. After correction by NextDenovo, mean depths of corrected reads are 1x, 11.5x, 18.5x and 42x, respectively. As you can see, the mean depth differences are 9x, 1.5x, 1.5x and 104x, which are quite inconsistent.

Here is contents of run.cfg of one sample:

[General]
job_type = local
job_prefix = LCL5_all
task = correct
rewrite = yes
parallel_jobs = 6
input_type = raw
read_type = ont
input_fofn = input.fofn

[correct_option]
read_cutoff = 1k
genome_size = 3g
pa_correction = 6
sort_options = -m 80g -t 10
minimap2_options_raw = -I 20G -t 10
correction_options = -p 10 -b

So my questions are:

  1. Is there a minimum mean depth threshold for NextDenovo to correct reads? And is there a maximum mean depth threshold? When raw data has a mean depth greater than this threshold, the extra reads may not be used to align and correct?
  2. I'm a little confused about the seed_depth and seed_cutoff parameters. What does seed mean? What's the role of seed in correction and assembly of NextDenovo.
  3. I'm interested in how NextDenovo performs the correction step. Since manuscript of NextDenovo is not avaliable currently, could you explain that briefly?

Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core

Python
Python 3.8.12

NextDenovo
nextDenovo v2.5.0

  1. I did not test the minimum mean depth required by NextDenovo , but I think input >= 30x data is appropriate. If you input too little data, the corrected data may much less, and the accuracy is also much less. No maximum mean depth threshold. The extra reads will be used to align.
  2. NextDenovo will corrected the longest seed_depth input data or corrected any reads longer than seed_cutoff , these reads to be corrected are called as seeds. seed_depth will be ignored if seed_cutoff is set. All reads will be used to correct these selected seeds.
  3. See our nextpolish paper, because NextDenovo used the same algorithm logic with nextpolish. We are now preparing the NextDenovo paper.

Thanks for your prompt reply. I will take a look at NextPolish paper and look forward to your NextDenovo paper.