Error parsing strand (?) from GFF line

Question

Error parsing strand (?) from GFF line

hermidalc opened this issue 2 years ago · comments

It causes the entire program to stop and then I can't use it to perform actions on the file. Here's an example:

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/006/247/105/GCA_006247105.1_UU_GM_1.1/GCA_006247105.1_UU_GM_1.1_genomic.gff.gz

$ gffread -E GCA_006247105.1_UU_GM_1.1_genomic.gff
Command line was:
gffread -E GCA_006247105.1_UU_GM_1.1_genomic.gff
Error parsing strand (?) from GFF line:
CM016926.1	Genbank	mRNA	1926137	1999447	.	?	.	ID=rna-gnl|WGS:VDLU|GMRT_22684;Parent=gene-GMRT_22684;exception=trans-splicing;gbkey=mRNA;locus_tag=GMRT_22684;orig_protein_id=gnl|WGS:VDLU|GMRT_22684;orig_transcript_id=gnl|WGS:VDLU|GMRT_22684;product=putative RNA-dependent helicase p68

I'm parsing a lot of GFF/GTFs at the same time, so having to pre-filter out possible offending lines sort of defeats the purpose, I think gffread should be able to ignore these without halting?

Leandro Hermida · Answer 1 · Tue Nov 15 2022 23:19:17 GMT+0800 (China Standard Time)

Also same issue with ? strand in this NCBI genome https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/435/GCA_000002435.2_UU_WB_2.1/GCA_000002435.2_UU_WB_2.1_genomic.gff.gz

Deleted user · Answer 2 · Tue Oct 31 2023 11:11:30 GMT+0800 (China Standard Time)

Hi,
It seems like gffread doesn't support the recognition of symbol "?" within the .gff file.
Column 7 of the .gff file represents the strand of the molecule and "?" stands for unknown.
To solve this problem you can just simply change the "?" into "." with the following python script:

input_gff = ""
output_gff = ""

with open(input_gff, "r") as input_file, open(output_gff, "w") as output_file:
    for line in input_file:
        line = line.strip()
        if line.startswith("#"):
            output_file.write(line + "\n")
        else:
            columns = line.split("\t")
            if len(columns) >= 6 and columns[6] == "?":
                columns[6] = "."
            output_file.write("\t".join(columns) + "\n")

Just place the path of your files and run this script my solve your problem.
Best,
Xylon