UniProt warning
widdowquinn opened this issue · comments
Summary:
When using ncfp
, a warning is thrown by BioServices and downloads fail.
Description:
With the command below:
ncfp -v -l 2022-07-20_th.log -c local_cache --keepcache -s helixalifil1.fasta ncfp_out noone@dev.null
using the attached input file, the following error occurs:
[...]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: 1020 sequence records read successfully from helixalifil1.fasta
[INFO] [ncbi_cds_from_protein.sequences]: Processing sequences...
1/5 Process input sequences: 0%| | 0/1020 [00:00<?, ?it/s]WARNING [bioservices.UniProt:596]: status is not ok with Bad Request
1/5 Process input sequences: 0%| | 0/1020 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/anaconda3/envs/ncfp_py310/bin/ncfp", line 33, in <module>
sys.exit(load_entry_point('ncfp', 'console_scripts', 'ncfp')())
File "/Users/lpritc/Documents/Development/GitHub/ncfp/ncbi_cds_from_protein/scripts/ncfp.py", line 267, in run_main
qrecords, qskipped = process_sequences(seqrecords, cachepath, args.disabletqdm)
File "/Users/lpritc/Documents/Development/GitHub/ncfp/ncbi_cds_from_protein/sequences.py", line 128, in process_sequences
qstring = result.split("\n")[1].strip()[:-1]
AttributeError: 'int' object has no attribute 'split'
With teh --debug
option set, the additional useful output is:
[INFO] [ncbi_cds_from_protein.sequences]: Processing sequences...
[DEBUG] [ncbi_cds_from_protein.sequences]: Guessing sequence type for tr|A0A258M961|A0A258M961_9BURK/52-81...
[DEBUG] [ncbi_cds_from_protein.sequences]: ...guessed UniProt
[DEBUG] [ncbi_cds_from_protein.sequences]: Uniprot record has GN field: B7Y67_11790
ncfp
Version:
Commit 5e7c612
Python Version:
3.10
Operating System:
macOS
The bioservices
warning is:
WARNING [bioservices.UniProt:596]: status is not ok with Bad Request
which may be an issue related to the recent (at time of writing) changes at UniProt, relating to how bioservices
connects to that resource.
These warnings arose as ncfp
was still using legacy query terms and mocking legacy return values with UniProt and bioservices
, specifically cross-referencing the EMBL database. See links below for the new API.
This is now fixed with 94e85c6.