What happens when there is high overlap ?
xapple opened this issue · comments
This has been fixed. It will reconstruct the entire sequence, but stop at the forward and reverse primers.
OK that's good to know. Is it fixed in the master branch, or is Version 2.8 ok already ?
It's been fixed in 2.8 and later.
Thanks, good to know.
It is named as "readthough". The tool, trimomatic can do this well.
lastest version of pandaseq can only generate one third of the trimomatic output.
So you mean you have to activate a special option on the command line ?
I didn't find the word "readthough" in the pandaseq documentation.
PANDAseq doesn't require any command line options to deal with this. I believe @yech1990 is referring to the “trimomatic” software package.
I just tried it on one sample of my dataset. Primers are 967F and 1046R. So the peak should be around 120. But look at the output from pandaseq. Seems like it finds a much shorter overlap optimal in most cases:
The reads are 111 base pairs each, so a final sequence length of 220 sequences means that pandaseq chose to only make 2 base pairs overlap...
That might be true anyway if the sequences are repetitive. See if increasing the k-mer table to 4 (-k 4
) improves the situation or try increasing the minimum overlap (-o 20
).