Encoding problem reading datapackage on Windows
cpina opened this issue · comments
Overview
When trying to use datapackage to open this file (saved locally, etc.):
https://raw.githubusercontent.com/Swiss-Polar-Institute/frictionless-data-packages/2a71927057191c9e59395dc07c6159513951aa18/10.5281_zenodo.3843263/datapackage.json
It fails with:
Traceback (most recent call last):
File "path to the script", line 2, in <module>
json.load(open('some_path\\10.5281_zenodo.3843263\\datapackage.json'))
File "C:\Program Files\Python38\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2628: character maps to <undefined>
The position 2628 is the closing of the typographic brackets for: instrument on the vessel “Akademik Tryoshnikov”
On Linux it works fine.
Investigation
Trying to open the file on Windows doing:
json.loads(open('datapackage.json'))
it fails with the same error.
Doing:
`json.loads(open('datapackage.json', encoding='utf-8'))
It works.
I think that Datapackage is doing exactly the first json.loads in datapackage/helpers.py
In the Python 3.0 changelog (https://docs.python.org/3/whatsnew/3.0.html) it says:
There is a platform-dependent default encoding, which on Unixy platforms can be set with the LANG environment variable (and sometimes also with some other platform-specific locale-related environment variables). In many cases, but not all, the system default is UTF-8; you should never count on this default
Which surprised me, I thought that it was UTF-8 in all the platforms.
To work around the problem for now without changing Datapackage we've done successfully:
https://stackoverflow.com/a/61570285/9294284
MERGED into frictionlessdata/frictionless-py#388 as we're going to provide first-class support for Mac/Windows in Frictionless.
There are some doubts that we have enough resources to replicate the effort to the datapackage
library. But if it's really needed exactly for datapackage
please re-open the issue 👍