output file contains only one amino acid
Jiayi-Zheng opened this issue · comments
Hi, I was using command seqtk subseq uniprot_sprot.fasta name.txt > out.fa
my name.txt looks like:
sp|Q9H2Y7|ZN106_HUMAN Zinc finger protein 106 OS=Homo sapiens OX=9606 GN=ZNF106 PE=1 SV=1
sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1
sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1
I got a file:
`$ head -5 out.fa
sp|P31946|1433B_HUMAN:14-14 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3
L
sp|Q04917|1433F_HUMAN:14-14 14-3-3 protein eta OS=Homo sapiens OX=9606 GN=YWHAH PE=1 SV=4
A`
`$ tail -4 out.fa
sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1
MARGQQKIQSQQKNAKKQAGQKKKQGHDQKAAAKAALIYTCTVCRTQMPDPKTFKQHFESKHPKTPLPPELADVQA
sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1
MAAPRPSPAISVSVSAPAFYAPQKKFGPVVAPKPKVNPFRPGDSEPPPAPGAQRAQMGRVGEIPPPPPEDFPLPPPPLAGDGDDAEGALGGAFPPPPPPIEESFPPAPLEEEIFPSPPPPPEEEGGPEAPIPPPPQPREKVSSIDLEIDSLSSLLDDMTKNDPFKARVSSGYVPPPVATPFSSKSSTKPAAGGTAPLPPWKSPSSSQPLPQVPAPAQSQTQFHVQPQPQPKPQVQLHVQSQTQPVSLANTQPRGPPASSPAPAPKFSPVTPKFTPVASKFSPGAPGGSGSQPNQKLGHPEALSAGTGSPQPPSFTYAQQREKPRVQEKQHPVPPPAQNQNQVRSPGAPGPLTLKEVEELEQLTQQLMQDMEHPQRQNVAVNELCGRCHQPLARAQPAVRALGQLFHIACFTCHQCAQQLQGQQFYSLEGAPYCEGCYTDTLEKCNTCGEPITDRMLRATGKAYHPHCFTCVVCARPLEGTSFIVDQANRPHCVPDYHKQYAPRCSVCSEPIMPEPGRDETVRVVALDKNFHMKCYKCEDCGKPLSIEADDNGCFPLDGHVLCRKCHTARAQT`
So the last parts looks normal but the starting ones are obviously lacking things, not sure why is that?
I tried locating the protein in raw file and it looks fine:
`$ grep -A 5 'GN=YWHAB' uniprot_sprot.fasta
sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY
LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN`