Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[4501 ERROR] 2023-06-13 22:29:40 the input data is insufficient for an assembly.

Evansd36 opened this issue · comments

Describe the bug
As stated by the error message NextDenovo does not like my input data. The input data I am using are contigs given to me by a collaborator. They are from a PacBio CLR run but have already been partially assembled. They are NOT the raw Pacbio reads and instead the fasta file comprises ~3,500 contigs ranging from 40kb-900kb. Because of this, it seems NextDenovo is reading the seed depth as extremely low (1.13) and stopping the assembly. I know that the data in the fasta file comprises the whole genome and should be pretty high quality (as some polishing/contig joining has already been done) but I am curious if because of this I cannot use NextDenovo. If this is the case do you have another assembly program you would suggest? I am also trying minimap2 on its own but I am still working through what filters I need. Runlog, config file, and a seq_stat run are all attached. If NextDenovo can handle extremely large contigs treated as "reads" is there some parameter I am missing that would help? Thanks!!

Error message (changed all files to .txt files so I could attach them)
seq_stat_g.txt
pid4501.log.txt
NextDonovo_config_file.txt

Genome characteristics
plant genome, 350Mb, Pretty repetitive, Heterozygous individual sequenced

GCC
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

Python
Python 3.5.5

NextDenovo
nextDenovo 2.5.2

To Reproduce (Optional)
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

Additional context (Optional)
I am running this on a pretty old server that hasn't been updated in a while. It is totally possible that running an older version of Python or GCC could be causing issues as well.

Try to set input_type = corrected, but I am not sure it will work, and just hava a try.

Same error, unfortunately, any other ideas?
pid5326.log.txt

Try to set genome size = 1M and if it still report an error, then i can't help it.