malonge / RagTag

Tools for fast and flexible genome assembly scaffolding and improvement

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scaffold longer than reference genome due to NNNNs

vappiah opened this issue · comments

Hi @malonge

I have used ragtag on different datasets and every time, the final sequence comes out being longer than the reference sequence. I found out that this is due to the introduction of NNNNs by ragtag. Is this behaviour expected?

Vincent

Dear Vincent,

I have also seen this behaviour while working with the Tribolium castaneum genome. If I scaffold my draft genome using the published reference, I see two giant contigs get attached to each other with ~1 Mbp of gap. I believe this behaviour is expected. However, A validation would be to draw a 1v1 dot plot between the reference and query to check if the introduced gaps make sense (you may use SibiliaZ or nucmer for this validation). I hope it helps.

PS: I am not the author of this tool. I just used it for a project

Dear @shivanshss,

Thanks for the information. I will draw the dot plot.
For the query sequence do you mean the one I generated after running ragtag or the assembly fasta file?

Dear @vappiah

Dot plot between your query and reference before using Ragtag would tell you if there is a gap in your query that could have been filled with Ns at the time of scaffolding.

Dot plot between your Ragtag output and your original reference will tell you if the gap position is weird in any way.

You may need to do some breakpoint analysis with original reads used for assembly to further your understanding of the gap.

Additionally I would also draw a kind of synteny plot between your original query and reference (this is similar to the dot plot but slight more informative).

This would be a sanity check just to make sure that something unexpected is not happening. If you find that everything is as expected, then you don't have to worry about the Ns that are introduced at the time of scaffolding.

I would also wait for the author to comment because, as I told earlier, I am not the author of this tool and they would know better.

Hope it helps.

Sincerely,
Shivansh