duplicate and close variants of the same alignment in the output
azat-badretdin opened this issue · comments
Azat Badretdin commented
When I use these parameters:
./miniprot -G 100 -O 10 -J 34 -F 30 --gff -ut32 nucleotide.fasta proteins.fasta
I get very close variants of the same alignment:
gpipedev21:issue-34$ grep WP_004242317 miniprot.gff | grep PAF
##PAF gi|490362554|ref|WP_004242317.1| 343 149 343 + gi|545778205|gb|U00096.3| 4641652 3221864 3222446 402 582 0 AS:i:680 ms:i:680 np:i:159 da:i:-1 do:i:0 cg:Z:194M cs:Z::2*accC*gacS*aatA*atcV:2*atcV:2*cacS*gaaD*cccR*ggcQ:1*ggtD:9*cgcY:1*agtA*aaaQ*gaaS*atcV*atcT:2*tatF:1*aacA:2*gttY*aatD:7*gaaQ:1*gagS:1*ggcA*aagA:8*gcgT:3*cgaS:1*aaaR*caaG:3*gaaG:3*tggY:2*ggtD:3*tcgA:3*gaaA:7*cggG:1*gacS:19*attL:2*cgaQ*ggcH*ctgI*aacA:2*cagE:2*tcgA:10*cgaK:2*tttI:1*ccgS:9*atgV:8*gtgL*tatF:1*aaaR*gccL:2*ggtE:1*gcgQ*ctgE:2*ttaQ*gtcI:1*gttA*cccA:1*aaaR:1*aaaI:5*cgtK
##PAF gi|490362554|ref|WP_004242317.1| 343 154 343 + gi|545778205|gb|U00096.3| 4641652 3221879 3222446 396 567 0 AS:i:675 ms:i:675 np:i:157 da:i:-1 do:i:0 cg:Z:189M cs:Z:*atcV:2*atcV:2*cacS*gaaD*cccR*ggcQ:1*ggtD:9*cgcY:1*agtA*aaaQ*gaaS*atcV*atcT:2*tatF:1*aacA:2*gttY*aatD:7*gaaQ:1*gagS:1*ggcA*aagA:8*gcgT:3*cgaS:1*aaaR*caaG:3*gaaG:3*tggY:2*ggtD:3*tcgA:3*gaaA:7*cggG:1*gacS:19*attL:2*cgaQ*ggcH*ctgI*aacA:2*cagE:2*tcgA:10*cgaK:2*tttI:1*ccgS:9*atgV:8*gtgL*tatF:1*aaaR*gccL:2*ggtE:1*gcgQ*ctgE:2*ttaQ*gtcI:1*gttA*cccA:1*aaaR:1*aaaI:5*cgtK
This also expresses itself, maybe, in duplication of some alignment output. For example:
gi|545778205|gb|U00096.3| miniprot CDS 729583 733323 6547 + 0 Parent=MP001848;Rank=18;Identity=0.9719;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot mRNA 729583 733323 6547 + . ID=MP001849;Rank=19;Identity=0.9719;Positive=0.9783;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot CDS 729583 733323 6547 + 0 Parent=MP001849;Rank=19;Identity=0.9719;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot mRNA 729583 733323 6547 + . ID=MP001850;Rank=20;Identity=0.9719;Positive=0.9783;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot CDS 729583 733323 6547 + 0 Parent=MP001850;Rank=20;Identity=0.9719;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot mRNA 729583 733323 6547 + . ID=MP001851;Rank=21;Identity=0.9719;Positive=0.9783;Target=gi|15829983|ref|NP_308756.1| 1 1247
gi|545778205|gb|U00096.3| miniprot CDS 729583 733323 6547 + 0 Parent=MP001851;Rank=21;Identity=0.9719;Target=gi|15829983|ref|NP_308756.1| 1 1247
The alignments are the same, but the Rank=x value is different in each case.
Heng Li commented
These two different hits. For now, you have to filter them out by yourself.
Azat Badretdin commented
Thanks. Which example are you talking about? Or both?
Heng Li commented
Both
Azat Badretdin commented
For now
This seems that there is a hope that the hits will be on per region in the future?