works: Warning for one bad id pollutes entire response
holub008 opened this issue · comments
Python version: 3.10.9
habanero version: 1.2.2
Problem
My application fetches records by user-input DOIs, which sometimes are mistyped & not a valid DOI. Per the docs, warn=True
is the solution to ignore invalid DOIs and other error conditions. As expected, an invalid DOI will return a None
result; however, every id after it in the input list will also be None
(even if valid). Is this expected behavior?
Lines 158 to 159 in 183af9e
Example
from habanero import Crossref
cr = Crossref(mailto="...", ua_string="...")
bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']
responses = cr.works(ids=good_dois, warn=True)
len([r for r in responses if r])
# 4
responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
len([r for r in responses if r])
# 0
My thoughts
This behavior seems unideal for all use cases. Either a caller wants an invalid DOI to fail the entire set of requests (which warn=False
already handles), or the caller wants to salvage as much data as possible, ignoring invalid results. By returning None
s with positional dependency, neither of these use cases are satisfied.
My expectation/preference would be for warn=True
to return None
for any id resulting in a non-200 query, and the standard response for any 200 query.
Thanks @holub008 for the issue.
I'll have a look and get back to you soon
Let me know if you need a hand making the code changes. From my read, it looks like the should_warn
variable can be dropped entirely and conditionals consolidated. But I might be ignoring some intended functionality.
Here's where this thing came from #69 It doesn't have the use case described unfortunately b/c we chatted about it beforehand in another venue.
I've been experimenting and my first thought is what I've done on the warn-fix
branch here is a good fix. I updated the tests that use warn=True
to make sure using that arg works as it should have.
Thoughts?
Nice! That change is more or less what I would have. Always appreciate your fast responses.
Great. Did you test it to make sure it works for you yet?
I think this now supports use cases:
warn=FALSE
:- all good IDs: all good, no problems
- 1 or more bad IDs: command fails upon first HTTP error response
warn=TRUE
:- all good IDs: all good, no problems
- 1 or more bad IDs: command should succeed even w/ bad IDs, where results have
None
in place of adict
of results; and a warning thrown for each bad ID
Just pulled warn-fix
and ran my example:
>>> from habanero import Crossref
>>> cr = Crossref(mailto="...", ua_string="...")
>>>
>>> bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
>>> good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']
>>>
>>> responses = cr.works(ids=good_dois, warn=True)
>>> len([r for r in responses if r])
4
>>> responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
/Users/kholub/habanero/habanero/request.py:135: UserWarning: 404 on 10.1200/jco.2021.39.15-suppl.e12560: Not Found
warnings.warn(mssg)
len([r for r in responses if r])
>>> len([r for r in responses if r])
4
That's the behavior I'd expect!
Great, i'll get this merged, and pushed to pypi soon
released a new version on pypi
Confirmed on my target application. Thanks again!
great