Error: no genomic sequence available (check -g option!).
raita27 opened this issue · comments
"gffread-0.12.7.Linux_x86_64/gffread" -w "transcripts.fa" -g "genome.fa" "stringtie_merged.gtf"
Hello,
I'm trying to output an isoform nucleotide fasta file for IsoformSwitchAnalyzeR using the above gffread command. However, I run into the following error:
Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
I've been able to successfully build a hisat2 gencode version m29 (mouse) index and create the merged gtf file using stringtie. It's weird because I do get a file outputted that's about 70,000 kb, but I'm not sure why the error is occurring. Any advice would be much appreciated!
genome fasta file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.transcripts.fa.gz
genome gtf file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.annotation.gtf.gz
hisat2 version: 2.2.0
stringtie version: 2.2.1
Though details are slightly different, I'm getting the same error!
My code:
ml gffread/0.11.6-GCCcore-8.3.0
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/994/315/GCA_010994315.2_ASM1099431v2/GCA_010994315.2_ASM1099431v2_genomic.fna.gz
gunzip GCA_010994315.2_ASM1099431v2_genomic.fna.gz
gffread -w pisaster_transcriptome.fa -g GCA_010994315.1_ASM1099431v1_genomic.fna genome_annotation.gff3
Note: genome_annotation.gff3 is from Dryad published Pisaster ochraceus annotation file here: https://doi.org/10.6071/M3ND50
Error message reads:
FASTA index file GCA_010994315.1_ASM1099431v1_genomic.fna.fai created.
Warning: couldn't find fasta record for 'Sc28pcJ_680'!
Error: no genomic sequence available (check -g option!).
Pulling my hair out over this!
I think I just discovered my problem + the route to solve it by reading over the closed github issues until I found someone who had the same issue: #34
The first comment applies to me- my headers/sequence names DO have spaces, and that's a problem.
Hope this helps you too!
Sequence names (IDs) cannot have spaces - you did not show the content of the genome_annotation.gff3
file you used there, the first column in there should not have spaces either (header does not matter), and clearly after indexing that genome file (with samtools faidx
, the same indexing scheme), I saw there was no such contig/chromosome in there called 'Sc28pcJ_680'.
As for @raita27, it looks like a different issue - they seemed to have tried to use a transcripts fasta file as a genome sequence.
Thanks so much for helping me diagnose my issue @gpertea !
"gffread-0.12.7.Linux_x86_64/gffread" -w "transcripts.fa" -g "genome.fa" "stringtie_merged.gtf"
Hello,
I'm trying to output an isoform nucleotide fasta file for IsoformSwitchAnalyzeR using the above gffread command. However, I run into the following error:
Warning: couldn't find fasta record for 'chr1'! Error: no genomic sequence available (check -g option!).
I've been able to successfully build a hisat2 gencode version m29 (mouse) index and create the merged gtf file using stringtie. It's weird because I do get a file outputted that's about 70,000 kb, but I'm not sure why the error is occurring. Any advice would be much appreciated!
genome fasta file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.transcripts.fa.gz genome gtf file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.annotation.gtf.gz
hisat2 version: 2.2.0 stringtie version: 2.2.1
Did you solve it?