harvardinformatics / degenotate

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError issue

LeebanY opened this issue · comments

Hi,

Thanks for the tool!

I was hoping you could me with error that's precluding me from generating any kind of results with degenotate. When I try to run degenotate I get the following error:

File "/mnt/shared/scratch/lyusuf/apps/conda/envs/degenotate/bin/degenotate.py", line 68, in <module> globs = SEQ.extractCDS(globs); File "/mnt/shared/scratch/lyusuf/apps/conda/envs/degenotate/lib/python3.10/site-packages/degenotate_lib/seq.py", line 182, in extractCDS globs['annotation'][transcript]['start-frame'] = int(exon_phase[first_exon_genome_start]) ValueError: invalid literal for int() with base 10: '.'

The tool is installed OK -- I checked with the test files provided. I've used shortened versions of the offending gff file and these yield the same error. I think this is some sort of formatting error, but I can't figure out what might be wrong with it. I've attached the top 100 rows of the gff file for reference. test.gff.txt

Thanks,
Leeban

Hi Leeban,

degenotate requires that you have the reading frame annotated for translated CDS sequences in your GFF, via the GFF phase field (column 8). Otherwise, we cannot correctly infer which bases are 1st, 2nd, or 3rd positions for each codon and thus cannot infer degeneracy. It looks like your GFF file is missing this information, which is why degenotate is throwing a cryptic error.

To use degenotate, you will need to add phase information to your CDS annotations. Unfortunately, it looks like there is an open issue in the Liftoff repository reporting the lack of this information for CDS features generated with Liftoff, so it may not be possible for liftoff to handle this. @gwct may have other ideas for tools that can fix up your GFF files to include this information.

We will also update degenotate to throw a more informative error message in this situation.

Tim

I'm not aware of anything that checks/fixes the phase of GFF files, unfortunately. I think this would have to be addressed as the GFF is generated.

For future reference: agat_sp_fix_cds_phases.pl seems to help in case of Liftoff-produced GFFs!