Question about how NextDenovo correct raw ONT reads
Jesson-mark opened this issue · comments
Question or Expected behavior
Hi, Dr. Hu, thanks for your great works for NextDenovo. I have a few questions about the working mechanism of NextDenovo, especially the correction step. I would appreciate it very much if you could help me.
Since I was using NextDenovo to correct raw ONT reads recently, I noticed that for data that have different mean depth, the mean depth difference between uncorrected ONT data and corrected ONT data is not consistent. For example, I have four data whose mean depth are 10x, 13x, 20x and 146x, respectively. After correction by NextDenovo, mean depths of corrected reads are 1x, 11.5x, 18.5x and 42x, respectively. As you can see, the mean depth differences are 9x, 1.5x, 1.5x and 104x, which are quite inconsistent.
Here is contents of run.cfg of one sample:
[General]
job_type = local
job_prefix = LCL5_all
task = correct
rewrite = yes
parallel_jobs = 6
input_type = raw
read_type = ont
input_fofn = input.fofn
[correct_option]
read_cutoff = 1k
genome_size = 3g
pa_correction = 6
sort_options = -m 80g -t 10
minimap2_options_raw = -I 20G -t 10
correction_options = -p 10 -b
So my questions are:
- Is there a minimum mean depth threshold for NextDenovo to correct reads? And is there a maximum mean depth threshold? When raw data has a mean depth greater than this threshold, the extra reads may not be used to align and correct?
- I'm a little confused about the
seed_depth
andseed_cutoff
parameters. What does seed mean? What's the role of seed in correction and assembly of NextDenovo. - I'm interested in how NextDenovo performs the correction step. Since manuscript of NextDenovo is not avaliable currently, could you explain that briefly?
Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
Python
Python 3.8.12
NextDenovo
nextDenovo v2.5.0
- I did not test the minimum mean depth required by NextDenovo , but I think input >= 30x data is appropriate. If you input too little data, the corrected data may much less, and the accuracy is also much less. No maximum mean depth threshold. The extra reads will be used to align.
- NextDenovo will corrected the longest
seed_depth
input data or corrected any reads longer thanseed_cutoff
, these reads to be corrected are called as seeds.seed_depth
will be ignored ifseed_cutoff
is set. All reads will be used to correct these selected seeds. - See our
nextpolish
paper, because NextDenovo used the same algorithm logic withnextpolish
. We are now preparing the NextDenovo paper.
Thanks for your prompt reply. I will take a look at NextPolish
paper and look forward to your NextDenovo
paper.