sckott / habanero

client for Crossref search API

Home Page:https://habanero.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

works: Warning for one bad id pollutes entire response

holub008 opened this issue · comments

Python version: 3.10.9
habanero version: 1.2.2

Problem

My application fetches records by user-input DOIs, which sometimes are mistyped & not a valid DOI. Per the docs, warn=True is the solution to ignore invalid DOIs and other error conditions. As expected, an invalid DOI will return a None result; however, every id after it in the input list will also be None (even if valid). Is this expected behavior?

if warning_thrown:
coll.append(None)

Example

from habanero import Crossref
cr = Crossref(mailto="...", ua_string="...")

bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']

responses = cr.works(ids=good_dois, warn=True)
len([r for r in responses if r])
# 4

responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
len([r for r in responses if r])
# 0

My thoughts

This behavior seems unideal for all use cases. Either a caller wants an invalid DOI to fail the entire set of requests (which warn=False already handles), or the caller wants to salvage as much data as possible, ignoring invalid results. By returning Nones with positional dependency, neither of these use cases are satisfied.

My expectation/preference would be for warn=True to return None for any id resulting in a non-200 query, and the standard response for any 200 query.

Thanks @holub008 for the issue.

I'll have a look and get back to you soon

Let me know if you need a hand making the code changes. From my read, it looks like the should_warn variable can be dropped entirely and conditionals consolidated. But I might be ignoring some intended functionality.

Here's where this thing came from #69 It doesn't have the use case described unfortunately b/c we chatted about it beforehand in another venue.

I've been experimenting and my first thought is what I've done on the warn-fix branch here is a good fix. I updated the tests that use warn=True to make sure using that arg works as it should have.

Thoughts?

Nice! That change is more or less what I would have. Always appreciate your fast responses.

Great. Did you test it to make sure it works for you yet?

I think this now supports use cases:

  • warn=FALSE:
    • all good IDs: all good, no problems
    • 1 or more bad IDs: command fails upon first HTTP error response
  • warn=TRUE:
    • all good IDs: all good, no problems
    • 1 or more bad IDs: command should succeed even w/ bad IDs, where results have None in place of a dict of results; and a warning thrown for each bad ID

Just pulled warn-fix and ran my example:

>>> from habanero import Crossref
>>> cr = Crossref(mailto="...", ua_string="...")
>>> 
>>> bad_doi = '10.1200/jco.2021.39.15-suppl.e12560'
>>> good_dois = ['10.1200/jco.2022.40.6_suppl.306', '10.1016/s1368-8375(21)00377-8', '10.1177/01945998211030908d', '10.1097/ju.0000000000000857.09']
>>> 
>>> responses = cr.works(ids=good_dois, warn=True)
>>> len([r for r in responses if r])
4
>>> responses = cr.works(ids=[bad_doi] + good_dois, warn=True)
/Users/kholub/habanero/habanero/request.py:135: UserWarning: 404 on 10.1200/jco.2021.39.15-suppl.e12560: Not Found
  warnings.warn(mssg)
len([r for r in responses if r])
>>> len([r for r in responses if r])
4

That's the behavior I'd expect!

Great, i'll get this merged, and pushed to pypi soon

released a new version on pypi

Confirmed on my target application. Thanks again!