Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chimeric mitochondrial-nuclear scaffolds

kanchond opened this issue · comments

Question or Expected behavior
I have generated genome assemblies for two different species of butterfly. The assembly sizes are ~700-800Gb after running purge_dups. In both assemblies I find that there is a large chimeric scaffold several Mbp in length which contain the entire ~15kb mitogenome embedded in it. The 15kb mitogenome portion of the scaffolds are 99.9-100% identical to the mitogenome assembled independently from Illumina data. So this is clearly a mis-assembly.

  1. How can I avoid these chimeric scaffolds? Is the much higher expected coverage of the mitogenome not used to prevent this happening?

  2. The presence of this chimeric scaffold makes me worry that there may be other chimeric scaffolds involving only nuclear sequence that are not so easily detected.

Thanks,
KD

Operating system
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core

GCC
gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Python
Python 3.7.4

NextDenovo
nextDenovo v2.5.0

Additional context (Optional)
Add any other context about the problem here.

  1. You can filter reads from mitogenome by mapping all reads to mitogenome.
  2. In general, assembly errors cannot be completely avoided, but you can use Hic or Bionano data to split the chimeric scaffolds.