gpertea / gffread

GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error: no genomic sequence available (check -g option!).

raita27 opened this issue · comments

"gffread-0.12.7.Linux_x86_64/gffread" -w "transcripts.fa" -g "genome.fa" "stringtie_merged.gtf"

Hello,

I'm trying to output an isoform nucleotide fasta file for IsoformSwitchAnalyzeR using the above gffread command. However, I run into the following error:

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).

I've been able to successfully build a hisat2 gencode version m29 (mouse) index and create the merged gtf file using stringtie. It's weird because I do get a file outputted that's about 70,000 kb, but I'm not sure why the error is occurring. Any advice would be much appreciated!

genome fasta file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.transcripts.fa.gz
genome gtf file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.annotation.gtf.gz

hisat2 version: 2.2.0
stringtie version: 2.2.1

Though details are slightly different, I'm getting the same error!

My code:

ml gffread/0.11.6-GCCcore-8.3.0
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/010/994/315/GCA_010994315.2_ASM1099431v2/GCA_010994315.2_ASM1099431v2_genomic.fna.gz
gunzip GCA_010994315.2_ASM1099431v2_genomic.fna.gz
gffread -w pisaster_transcriptome.fa -g GCA_010994315.1_ASM1099431v1_genomic.fna genome_annotation.gff3

Note: genome_annotation.gff3 is from Dryad published Pisaster ochraceus annotation file here: https://doi.org/10.6071/M3ND50

Error message reads:

FASTA index file GCA_010994315.1_ASM1099431v1_genomic.fna.fai created.
Warning: couldn't find fasta record for 'Sc28pcJ_680'!
Error: no genomic sequence available (check -g option!).

Pulling my hair out over this!

I think I just discovered my problem + the route to solve it by reading over the closed github issues until I found someone who had the same issue: #34

The first comment applies to me- my headers/sequence names DO have spaces, and that's a problem.

Hope this helps you too!

Sequence names (IDs) cannot have spaces - you did not show the content of the genome_annotation.gff3 file you used there, the first column in there should not have spaces either (header does not matter), and clearly after indexing that genome file (with samtools faidx, the same indexing scheme), I saw there was no such contig/chromosome in there called 'Sc28pcJ_680'.

As for @raita27, it looks like a different issue - they seemed to have tried to use a transcripts fasta file as a genome sequence.

Thanks so much for helping me diagnose my issue @gpertea !

"gffread-0.12.7.Linux_x86_64/gffread" -w "transcripts.fa" -g "genome.fa" "stringtie_merged.gtf"

Hello,

I'm trying to output an isoform nucleotide fasta file for IsoformSwitchAnalyzeR using the above gffread command. However, I run into the following error:

Warning: couldn't find fasta record for 'chr1'! Error: no genomic sequence available (check -g option!).

I've been able to successfully build a hisat2 gencode version m29 (mouse) index and create the merged gtf file using stringtie. It's weird because I do get a file outputted that's about 70,000 kb, but I'm not sure why the error is occurring. Any advice would be much appreciated!

genome fasta file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.transcripts.fa.gz genome gtf file source: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.annotation.gtf.gz

hisat2 version: 2.2.0 stringtie version: 2.2.1

Did you solve it?