williamgilpin / pypdb

A Python API for the RCSB Protein Data Bank (PDB)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Calling search on a Query that returns no results causes an AttributeError exception to be raised

tux2603 opened this issue · comments

Bug Description

When calling the search method on a Query object that would return no results, an error similar to the following is raised

/home/o-linux/anaconda3/envs/proteins/lib/python3.7/site-packages/pypdb/util/http_requests.py:61: UserWarning: Too many failures on requests. Exiting...
  warnings.warn("Too many failures on requests. Exiting...")
Traceback (most recent call last):
  File "pypdb/mwe.py", line 4, in <module>
    found_pdbs = Query('McNoman, Norman', query_type='AdvancedAuthorQuery').search()
  File "/home/o-linux/anaconda3/envs/proteins/lib/python3.7/site-packages/pypdb/pypdb.py", line 253, in search
    if response.status_code == 200:
AttributeError: 'NoneType' object has no attribute 'status_code'

System info

  • Operating system: Linux Debian, kernel 5.9.0-5-amd64
  • Anaconda version: 4.9.2
  • Python version: 3.7.9
  • pypdb version: 2.0, from conda-forge

Duplication

This issue can be duplicated by using this python script

from pypdb import *
found_pdbs = Query('McNoman, Norman', query_type='AdvancedAuthorQuery').search()

Hi, thanks for this report. it looks like this is only working for certain names. For example, this works:

found_pdbs = Query('Perutz, M.F.', query_type='AdvancedAuthorQuery').search()
print(found_pdbs)

but this fails:

found_pdbs = Query('Perutz', query_type='AdvancedAuthorQuery').search()
print(found_pdbs)

So that means that the database lookup is super sensitive to the format of the author name, which doesn't seem ideal. For now, I would suggest doing a keyword search (which I agree is not ideal). As far as I can tell, there are no hits for that specific query, but let me know if there's a query that returns results on the website, but not from keyword search.

I should mention that @lacoperon has developed a much more full featured API that will eventually replace the old one. I think that the way to do this search with the new API (using the latest GitHub version) looks like this

from pypdb.clients.search.search_client import perform_search
from pypdb.clients.search.search_client import SearchService, ReturnType
from pypdb.clients.search.operators import text_operators

search_service = SearchService.TEXT
search_operator = text_operators.ContainsPhraseOperator(value="McNoman",
                                            attribute="audit_author.name")
return_type = ReturnType.ASSEMBLY

results = perform_search(search_service, search_operator, return_type)

print(results[:5])

This still fails for the McNoman query, but it does work for Perutz. Eventually we will convert all the old functions to use the new API under the hood.

Yeah, it seems that for almost all of the basic Query type searches will cause the issue if there are no hits in the database, for example using Query('KFC'). I was poking around in the code a bit and I think I found the issue, so I can get a pr put together if you want.

PR is in