widdowquinn / ncfp

Program and package that retrieves nucleotide coding sequences from NCBI that correspond to a set of input protein sequences.

Home Page:https://widdowquinn.github.io/ncfp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stockholm domain format doesn't work with non-UniProt FASTA sequences

widdowquinn opened this issue · comments

Summary:

Extracting CDS features uses the GN=.* regex, but if adding Stockholm domains to NCBI FASTA files, this is missing. That causes corresponding features not to be found, leading to false negatives.

We should add an additional check for the sequence ID, not just the GN= field, when that is missing.