gffread to protein
bijendrabio opened this issue · comments
Hello,
I tried to extract the coding proteins FASTA from gtf file but the output looks like the following;
command used: gffread -y output_protein.fasta -g genome.fasta transcripts.gff3
output;
LYT.WSHVP.QTLQSHR.CPSRLLELCSSPLLQMTIHGA.YSFGE.HIMYDIKLDNLQYVRSW.LRKLKL
LVQLDKFRTCHP.TPC.SS.TSLDRLHQPAHACLWCGQQWWQYRGLVRQSQSSR.QRSYRTQLVGWREQ.
LPGWCL
Curious what these dots (.) refers to and how can I extract the proper coding protein FASTA sequences? Kindly suggest!
Regards,
B
It's an internal stop codon, see Issue 14
According to the method mentioned in #Issue 14, adding -S will only change . to *. But the stop codon inserted in the middle of the mRNA sequence will be translated into amino acids, for example, UGA is U and UAG is O. What to do in this situation?