gpertea / gffread

GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gffread to protein

bijendrabio opened this issue · comments

Hello,
I tried to extract the coding proteins FASTA from gtf file but the output looks like the following;
command used: gffread -y output_protein.fasta -g genome.fasta transcripts.gff3

output;
LYT.WSHVP.QTLQSHR.CPSRLLELCSSPLLQMTIHGA.YSFGE.HIMYDIKLDNLQYVRSW.LRKLKL
LVQLDKFRTCHP.TPC.SS.TSLDRLHQPAHACLWCGQQWWQYRGLVRQSQSSR.QRSYRTQLVGWREQ.
LPGWCL

Curious what these dots (.) refers to and how can I extract the proper coding protein FASTA sequences? Kindly suggest!

Regards,
B

It's an internal stop codon, see Issue 14

According to the method mentioned in #Issue 14, adding -S will only change . to *. But the stop codon inserted in the middle of the mRNA sequence will be translated into amino acids, for example, UGA is U and UAG is O. What to do in this situation?