error message requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url even when ua and mailto is set

Question

error message requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url even when ua and mailto is set

WolfgangFahl opened this issue 2 years ago · comments

the test below used to work. Now i am using habanero 1.2.2 and i get the error above. Curl via comand line and direct API access in my browser strangely works.

pip list | grep habanero
habanero                      1.2.2

with a wrapper

class Crossref:
    """
    Crossref access
    """
    
    def __init__(self,mailto=None,ua_string=None):
        """
        constructor
        """
        if mailto is None:
            mailto="...." # here is ma mail address
        if ua_string is None:
            ua_string=f"pysotsog/{skg.__version__} (https://pypi.org/project/pysotsog/; mailto:{mailto})"
        self.cr = habanero.Crossref(mailto=mailto,ua_string=ua_string)  
    
    def doiMetaData(self, dois:list):
        """ 
        get the meta data for the given dois
        
        Args:
            doi(list): a list of dois
        """
        metadata = None
        response = self.cr.works(ids=dois)
        if 'status' in response and 'message' in response and response['status'] == 'ok':
            metadata = response['message']
        return metadata

 def test_crossref(self):
        """
        test crossref
        """
        dois=["10.1016/J.ARTMED.2017.07.002"]
        crossref=Crossref()
        #bib_entry=crossref.doiBibEntry(doi)
        meta_data=crossref.doiMetaData(dois)
        print(meta_data)

Scott Chamberlain · Answer 1 · Fri Nov 18 2022 02:14:15 GMT+0800 (China Standard Time)

Thanks for the issue. I can't run this as is. Where is the skg package? That issue you link to is 4 yrs old. There may have been an issue with Crossref at that time, but it's unlikely to be the same problem

Wolfgang Fahl · Answer 2 · Fri Nov 18 2022 05:12:34 GMT+0800 (China Standard Time)

This error is very strange. See https://github.com/WolfgangFahl/pysotsog/blob/main/tests/test_crossref.py for the test source code and https://github.com/WolfgangFahl/pysotsog/blob/main/skg/crossref.py for the helper package.
The CI runs fine and the code runs on most of my machines with no problems. The python versions are 3.9 and 3.10 and the operating sytems linux and MacOs. The machine that is not working is using Python 3.10.8 on MacOS 11.6.2. I have tried quite a few work-arounds - see below. None of the work arounds worked so i wonder why i can get a 401.

To reproduce the code

git clone https://github.com/WolfgangFahl/pysotsog
pip install green
cd pysotsog
green

  File "/Users/wf/Library/Python/3.10/lib/python/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.crossref.org/v1/works/10.1016%2FJ.ARTMED.2017.07.002/transform

def test_curl_style(self): 
        session = requests.Session()
        session.headers.update({
            'User-Agent': 'curl/7.86.0',
            'Accept': 'application/x-bibtex',
        })
        from http.cookiejar import DefaultCookiePolicy
        session.cookies.set_policy(DefaultCookiePolicy(allowed_domains=[]))
        response=session.get('https://doi.org/10.1021/acs.jpcc.0c05161')
        print (response.status_code)
        print (response.text)
    
    def doi2bib(self,doi):
        """
        Return a bibTeX string of metadata for a given DOI.
        """
        url = f"https://doi.org/{doi}" 
        headers = {
            "accept": "application/x-bibtex"
        }
        r = requests.get(url, headers = headers)
        if r.status_code==200:
            return r.text
        else:
            return r.status_code
    
    def test_crossref_bib(self):
        doi="10.1016/J.ARTMED.2017.07.002"
        bib_text=self.doi2bib(doi)
        print (bib_text)
    
    def test_crossref_direct(self):
        """
        """
        headers = {
            'User-Agent': 'Mozilla/5.0; mailto:@doe.com',
        } 
        doi="10.1016/J.ARTMED.2017.07.002"
        url=f"https://api.crossref.org/v1/works/{doi}"
        print (url)
        response = requests.get(url,headers=headers)
        print(response.status_code)
        if response.status_code==200:
            print(response.json())

Wolfgang Fahl · Answer 3 · Fri Nov 18 2022 05:15:26 GMT+0800 (China Standard Time)

Just tried python 3.9 and getting the same error.

Wolfgang Fahl · Answer 4 · Fri Nov 18 2022 22:19:43 GMT+0800 (China Standard Time)

It is very strange. The error is computer dependend not IP, not MAC address. What on earth could crossref evaluate do create a 401 specifically for a computer?

Wolfgang Fahl · Answer 5 · Sun Nov 20 2022 17:04:05 GMT+0800 (China Standard Time)

Does habanero have some kind of proxy cabability e.g. to ask another computer todo the actual work?

Scott Chamberlain · Answer 6 · Fri Nov 25 2022 12:25:22 GMT+0800 (China Standard Time)

Thanks for the details @WolfgangFahl I'll take a look soon.

I'd be surprised if the problem was with habanero, but it's possible i guess

Wolfgang Fahl · Answer 7 · Mon Nov 28 2022 17:40:37 GMT+0800 (China Standard Time)

I have opened a ticket with CrossRef in the meantime but didn't get a reply yet. For my daily work this is still a showstopper and if have to use a different machine. I wonder whether a simple docker environment would change the situation and may try it out in the upcoming weeks if not other solution comes up.

Scott Chamberlain · Answer 8 · Thu Dec 01 2022 14:30:53 GMT+0800 (China Standard Time)

I ran the code in your comment #110 (comment) and green ran without any problems. If you can find where the issue is coming from - and if its coming from habanero - then I can help fix.

Wolfgang Fahl · Answer 9 · Sat Dec 10 2022 21:58:04 GMT+0800 (China Standard Time)

There is now a reply from CrossRef and i explained that this is only on a single machine and only when using habanero. I can access the service itself just fine using the class below. See latests changes at WolfgangFahl/pysotsog@64bf3c9

test_doi.py

from unittest import IsolatedAsyncioTestCase
import json

class TestDOILookup(IsolatedAsyncioTestCase): 
    """
    test DOI lookup
    """
    async def testDOILookup(self):
        """
        test DOI lookup 
        """
        debug=True
        dois=["10.1109/TBDATA.2022.3224749"]
        expected=["@article{Li_2022,","@inproceedings{Faruqui_2015,"]
        for i,doi in enumerate(dois):
            doi_obj=DOI(doi)
            result=await doi_obj.doi2bibTex()
            if debug:
                print(result)
            self.assertTrue(result.startswith(expected[i]))
            
    async def testCiteproc(self):
        """
        cite proc lookup
        """ 
        dois=["10.3115/v1/N15-1184"]
        debug=True
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.doi2Citeproc()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("DOI" in json_data)
            self.assertEqual(doi.lower(),json_data["DOI"])
        
    async def testDataCiteLookup(self):
        """
        test the dataCite Lookup api
        """
        debug=True
        dois=["10.5438/0012"]
        for doi in dois:
            doi_obj=DOI(doi)
            json_data=await doi_obj.dataCiteLookup()
            if debug:
                print(json.dumps(json_data,indent=2))
            self.assertTrue("data" in json_data)
            data=json_data["data"]
            self.assertTrue("id" in data)
            self.assertEquals(doi,data["id"])
            pass

doi.py

'''
Created on 2022-11-22

@author: wf
'''
import re
import aiohttp

class DOI:
    """
    Digital Object Identifier handling
    
    see e.g. https://www.wikidata.org/wiki/Property:P356
    see https://www.doi.org/doi_handbook/2_Numbering.html#2.2
    see https://github.com/davidagraf/doi2bib2/blob/master/server/doi2bib.js
    see https://citation.crosscite.org/docs.html
    
    """
    pattern=re.compile(r"((?P<directory_indicator>10)\.(?P<registrant_code>[0-9]{4,})(?:\.[0-9]+)*(?:\/|%2F)(?:(?![\"&\'])\S)+)")
  
    def __init__(self,doi:str):
        """
        a DOI
        """
        self.doi=doi
        match=re.match(DOI.pattern,doi)
        self.ok=bool(match)
        if self.ok:
            self.registrant_code=match.group("registrant_code")
        
    @classmethod
    def isDOI(cls,doi:str):
        """
        check that the given string is a doi
        
        Args:
            doi(str): the potential DOI string
        """
        if not doi:
            return False
        if isinstance(doi,list):
            ok=len(doi)>0
            for single_doi in doi:
                ok=ok and cls.isDOI(single_doi)
            return ok
        if not isinstance(doi,str):
            return False
        doi_obj=DOI(doi)
        return doi_obj.ok
    
    async def fetch_json(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.json()
    
    async def fetch_text(self,url,headers):
        """
        fetch text for the given url with the given headers
        """
        async with aiohttp.ClientSession(headers=headers) as session:
            async with session.get(url) as response:
                return await response.text()
    
    async def doi2bibTex(self):
        """
        get the bibtex result for my doi
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/x-bibtex; charset=utf-8'
        }
        return await self.fetch_text(url,headers)     
    
    async def doi2Citeproc(self):
        """
        get the Citeproc JSON result for my doi
        see https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
        """
        url=f"https://doi.org/{self.doi}"
        headers= {
            'Accept': 'application/vnd.citationstyles.csl+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)
    
    async def dataCiteLookup(self):
        """
        get the dataCite json result for my doi
        """
        url=f"https://api.datacite.org/dois/{self.doi}"
        headers= {
            'Accept': 'application/vnd.api+json; charset=utf-8'
        }
        return await self.fetch_json(url, headers)

Scott Chamberlain · Answer 10 · Mon Feb 06 2023 05:11:03 GMT+0800 (China Standard Time)

great, glad it works for you. sounds like no changes are needed here

Wolfgang Fahl · Answer 11 · Tue Jan 30 2024 19:41:23 GMT+0800 (China Standard Time)

i still can't use habanero - the above is only a workaround

Scott Chamberlain · Answer 12 · Wed Jan 31 2024 00:59:17 GMT+0800 (China Standard Time)

Okay, sorry it doesn't work! I closed it because i'ts been a while and I have no ideas of how to fix this for you.

The 401 Client Error: Unauthorized for url error doesn't make sense because the API does not require authentication. The mailto header is just to get in the "faster lane" where requests should be more reliable/faster .

The only thing I can think is that perhaps your IP address got on their block list. Perhaps you were hitting the API pretty hard at some point? I dont know if they do that kind of thing or not