widdowquinn / ncfp

Program and package that retrieves nucleotide coding sequences from NCBI that correspond to a set of input protein sequences.

Home Page:https://widdowquinn.github.io/ncfp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UniProt warning

widdowquinn opened this issue · comments

Summary:

When using ncfp, a warning is thrown by BioServices and downloads fail.

Description:

With the command below:

ncfp -v -l 2022-07-20_th.log -c local_cache --keepcache -s helixalifil1.fasta ncfp_out noone@dev.null

using the attached input file, the following error occurs:

[...]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: 1020 sequence records read successfully from helixalifil1.fasta
[INFO] [ncbi_cds_from_protein.sequences]: Processing sequences...
1/5 Process input sequences:   0%|                                                                                                                                     | 0/1020 [00:00<?, ?it/s]WARNING [bioservices.UniProt:596]:  status is not ok with Bad Request
1/5 Process input sequences:   0%|                                                                                                                                     | 0/1020 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/anaconda3/envs/ncfp_py310/bin/ncfp", line 33, in <module>
    sys.exit(load_entry_point('ncfp', 'console_scripts', 'ncfp')())
  File "/Users/lpritc/Documents/Development/GitHub/ncfp/ncbi_cds_from_protein/scripts/ncfp.py", line 267, in run_main
    qrecords, qskipped = process_sequences(seqrecords, cachepath, args.disabletqdm)
  File "/Users/lpritc/Documents/Development/GitHub/ncfp/ncbi_cds_from_protein/sequences.py", line 128, in process_sequences
    qstring = result.split("\n")[1].strip()[:-1]
AttributeError: 'int' object has no attribute 'split'

With teh --debug option set, the additional useful output is:

[INFO] [ncbi_cds_from_protein.sequences]: Processing sequences...
[DEBUG] [ncbi_cds_from_protein.sequences]: Guessing sequence type for tr|A0A258M961|A0A258M961_9BURK/52-81...
[DEBUG] [ncbi_cds_from_protein.sequences]: ...guessed UniProt
[DEBUG] [ncbi_cds_from_protein.sequences]: Uniprot record has GN field: B7Y67_11790

helixalifil1.fasta.txt

ncfp Version:

Commit 5e7c612

Python Version:

3.10

Operating System:

macOS

The bioservices warning is:

WARNING [bioservices.UniProt:596]:  status is not ok with Bad Request

which may be an issue related to the recent (at time of writing) changes at UniProt, relating to how bioservices connects to that resource.

These warnings arose as ncfp was still using legacy query terms and mocking legacy return values with UniProt and bioservices, specifically cross-referencing the EMBL database. See links below for the new API.

This is now fixed with 94e85c6.