bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The lengths of sseq and qseq in the output results of diamond blastp are inconsistent

ZhangBioLab opened this issue · comments

Hello! Sorry to bother you!
I want to know why the lengths of sseq and qseq in the output results of diamond blastp are inconsistent? If the result is obtained using blastp, it will be filled with -, and the length of sseq and qseq are the same. The example is as follows:

sseq:
FGVFNNFYSTDFALLAPPNPGILRELPSDNALGWA-HW------RAGYRCYELPRNAKTQLDIDTPGELQLLSCSPGLPPELAAVLSGMPRARAEVLLEILVSPGKNLFLVGRISGHLLRFLERTSACRTQALIEGRGMKAEGL
qseq:
FGFHFRVKSQFFQ*LSQHNPG--GQLCQGKSNGFADEWNGPRGARVDFEDENLPV-LQSELDVHQPDDPQFLRQQPGLPPDLFLDASGDAHGR------------QNAGTVSRMDSGLLDMLHDSAHHGHRAVADGVHIDLDGI

The following are the results of diamond:

sseq:YLNLDSLSLHRLTDHHAGRDLRERLTGRLADKRHGTRGARIHFENVDLRVLGVGVLHGELHIHEALHLQRLSKESRLTLDLFNKLRAEAVRRKRARRVARVNAGLLDMLHDAADPDFVAVTHSVNVHFHRVIQEPIKE
qseq:HFRVKSQFFQXLSQHNPGGQLCQGKSNGFADEWNGPRGARVDFEDENLPVLQSELDVHQPDDPQFLRQQPGLPPDLFLDASGDAHGRQNAGTVSRMDSGLLDMLHDSAHHGHRAVADGVHIDLDGILEELVDQ

I want to know how to get sseq and qseq with the same length? And I found that diamond replaced the amino acid 'U' with 'X'. Will there be any impact?
Looking forward to your answer, thank you!

You can use qseq_gapped and sseq_gapped for this. Replacing U with X can't be avoided right now.

You can use qseq_gapped and sseq_gapped for this. Replacing U with X can't be avoided right now.

Ok! Thank u!
It means not --outfmt 6 qseq sseq, but use --outfmt 6 qseq_gapped sseq_gapped, right?

Yes

I got it, Thanks!