The lengths of sseq and qseq in the output results of diamond blastp are inconsistent
ZhangBioLab opened this issue · comments
Hello! Sorry to bother you!
I want to know why the lengths of sseq and qseq in the output results of diamond blastp are inconsistent? If the result is obtained using blastp, it will be filled with -, and the length of sseq and qseq are the same. The example is as follows:
sseq:
FGVFNNFYSTDFALLAPPNPGILRELPSDNALGWA-HW------RAGYRCYELPRNAKTQLDIDTPGELQLLSCSPGLPPELAAVLSGMPRARAEVLLEILVSPGKNLFLVGRISGHLLRFLERTSACRTQALIEGRGMKAEGL
qseq:
FGFHFRVKSQFFQ*LSQHNPG--GQLCQGKSNGFADEWNGPRGARVDFEDENLPV-LQSELDVHQPDDPQFLRQQPGLPPDLFLDASGDAHGR------------QNAGTVSRMDSGLLDMLHDSAHHGHRAVADGVHIDLDGI
The following are the results of diamond:
sseq:YLNLDSLSLHRLTDHHAGRDLRERLTGRLADKRHGTRGARIHFENVDLRVLGVGVLHGELHIHEALHLQRLSKESRLTLDLFNKLRAEAVRRKRARRVARVNAGLLDMLHDAADPDFVAVTHSVNVHFHRVIQEPIKE
qseq:HFRVKSQFFQXLSQHNPGGQLCQGKSNGFADEWNGPRGARVDFEDENLPVLQSELDVHQPDDPQFLRQQPGLPPDLFLDASGDAHGRQNAGTVSRMDSGLLDMLHDSAHHGHRAVADGVHIDLDGILEELVDQ
I want to know how to get sseq and qseq with the same length? And I found that diamond replaced the amino acid 'U' with 'X'. Will there be any impact?
Looking forward to your answer, thank you!
You can use qseq_gapped
and sseq_gapped
for this. Replacing U with X can't be avoided right now.
You can use
qseq_gapped
andsseq_gapped
for this. Replacing U with X can't be avoided right now.
Ok! Thank u!
It means not --outfmt 6 qseq sseq
, but use --outfmt 6 qseq_gapped sseq_gapped
, right?
Yes
Yes
I got it, Thanks!