williamgilpin / pypdb

A Python API for the RCSB Protein Data Bank (PDB)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError: 'NoneType' object is not subscriptable AND json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

kfuku52 opened this issue · comments

Hi, Thank you for the development and maintenance of this useful package. With the latest GitHub version of pypdb, I tried to run MMseqs2 searches of many sequences using Query as below, but the command returned an error in some cases. This error seems to be dependent on query sequences. Here is a reproducible example with one of the problematic sequences.

Query

from pypdb import Query

aa_query = 'MGEILYFDTVLAPLSLFLPIGYHAYLWQCFKSKPSHTYIGIDALRRKGWFLDMKEDVDQKGMLAIQSVRNTLMSTIFIASIAVLVSMALAALTNNAYNASQLFRSAFFGSQIGGIVVLKYGSASLFLLVSFLCSSMAVGFLIDANFLINIGIGQFSSPAYTQTIFERGFTLALIGNRMLCMTFPLILWIFGPVSMALSSLALVWGLYELDFPGKLPSVKHG'
q = Query(aa_query, query_type='sequence', return_type='polymer_entity')
out = q.search()
/Users/kf/miniconda3/envs/pymol/lib/python3.8/site-packages/pypdb/util/http_requests.py:65: UserWarning: Too many failures on requests. Exiting...
  warnings.warn("Too many failures on requests. Exiting...")
/Users/kf/miniconda3/envs/pymol/lib/python3.8/site-packages/pypdb/pypdb.py:292: UserWarning: Retrieval failed, returning None
  warnings.warn("Retrieval failed, returning None")
Traceback (most recent call last):
  File "/Users/kf/Dropbox/repos/csubst/csubst/csubst", line 309, in <module>
    args.handler(args)
  File "/Users/kf/Dropbox/repos/csubst/csubst/csubst", line 34, in command_site
    main_site(g)
  File "/Volumes/kfssd1/Dropbox/repos/csubst/csubst/main_site.py", line 658, in main_site
    g['pdb'] = pdb_sequence_search(g)
  File "/Volumes/kfssd1/Dropbox/repos/csubst/csubst/main_site.py", line 605, in pdb_sequence_search
    best_hit = mmseqs2_out['result_set'][0]
TypeError: 'NoneType' object is not subscriptable

Process finished with exit code 1

perform_search

After reading #26, I also checked perform_search, but it ended up with another error as below. I would appreciate it if you could give me any advice. Thank you.

from pypdb.clients.search.search_client import perform_search
from pypdb.clients.search.operators.sequence_operators import SequenceOperator

aa_query = 'MGEILYFDTVLAPLSLFLPIGYHAYLWQCFKSKPSHTYIGIDALRRKGWFLDMKEDVDQKGMLAIQSVRNTLMSTIFIASIAVLVSMALAALTNNAYNASQLFRSAFFGSQIGGIVVLKYGSASLFLLVSFLCSSMAVGFLIDANFLINIGIGQFSSPAYTQTIFERGFTLALIGNRMLCMTFPLILWIFGPVSMALSSLALVWGLYELDFPGKLPSVKHG'
seq_op = SequenceOperator(sequence=aa_query, identity_cutoff=0.99, evalue_cutoff=1000)
out = perform_search(search_operator=seq_op, return_with_scores=True)
Querying RCSB Search using the following parameters:
 {"query": {"type": "terminal", "service": "sequence", "parameters": {"evalue_cutoff": 1000, "identity_cutoff": 0.99, "target": "pdb_protein_sequence", "value": "MGEILYFDTVLAPLSLFLPIGYHAYLWQCFKSKPSHTYIGIDALRRKGWFLDMKEDVDQKGMLAIQSVRNTLMSTIFIASIAVLVSMALAALTNNAYNASQLFRSAFFGSQIGGIVVLKYGSASLFLLVSFLCSSMAVGFLIDANFLINIGIGQFSSPAYTQTIFERGFTLALIGNRMLCMTFPLILWIFGPVSMALSSLALVWGLYELDFPGKLPSVKHG"}}, "request_options": {"return_all_hits": true}, "return_type": "entry"} 

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/site-packages/pypdb/clients/search/search_client.py", line 183, in perform_search
    return perform_search_with_graph(query_object=search_operator,
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/site-packages/pypdb/clients/search/search_client.py", line 271, in perform_search_with_graph
    for query_hit in response.json()["result_set"]:
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/kf/miniconda3/envs/pymol/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Thank you very much for this issue, that's interesting that it depends on query. If you do the advanced search by the GUI on the RCSB website, are there any special features of the search result?

I just tried it and got no hits in the RCSB advanced search.

Thank you! Does this mean that it is an issue with the API, in which case we would need to throw an error?

No hits seem to be a valid result, and it's not treated as an error in the RCSB's advanced search. Since this is expected behavior in GUI, it might be good to deal with no-hit without an error to make pypdb consistent with the RCSB's GUI.

I am still wondering how I should handle this error. Any suggestions are appreciated.

Hi, I think that it should probably return an empty dict and emit a warning. I have been pretty swamped and haven't found time to troubleshoot this further. If you have a workaround that you like, if you would mind posting the code here (or opening a PR), I can incorporate it into the next version

I'm sorry I missed your reply. As a workaround, I'd like to handle it with try-except for now. If I get a chance to learn more about pypdb’s code in the future, I'll come back to this issue.