gpertea / gffread

GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault or corrupted output GFF when using `-C` coding only option on multiple NCBI genome assembly GFF files

hermidalc opened this issue · comments

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/843/825/GCA_000843825.1_ViralProj14424/GCA_000843825.1_ViralProj14424_genomic.gff.gz

This either segfaults or produces a corrupted output GFF file. I've run into other examples with NCBI genome assembly GFF files.

Might have to do with the fact this genome has five_prime_UTR and three_prime_UTR features, but gffread should be able to handle that and produce exon, CDS, and mRNA output features where the exon and mRNA ranges include the UTR regions and the CDS ranges do not.