frictionlessdata / datapackage-py

A Python library for working with Data Packages.

Home Page:https://frictionlessdata.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoding problem reading datapackage on Windows

cpina opened this issue · comments

Overview

When trying to use datapackage to open this file (saved locally, etc.):
https://raw.githubusercontent.com/Swiss-Polar-Institute/frictionless-data-packages/2a71927057191c9e59395dc07c6159513951aa18/10.5281_zenodo.3843263/datapackage.json

It fails with:

Traceback (most recent call last):
  File "path to the script", line 2, in <module>
    json.load(open('some_path\\10.5281_zenodo.3843263\\datapackage.json'))
  File "C:\Program Files\Python38\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
 File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2628: character maps to <undefined>

The position 2628 is the closing of the typographic brackets for: instrument on the vessel “Akademik Tryoshnikov”

On Linux it works fine.

Investigation

Trying to open the file on Windows doing:
json.loads(open('datapackage.json'))

it fails with the same error.

Doing:
`json.loads(open('datapackage.json', encoding='utf-8'))

It works.

I think that Datapackage is doing exactly the first json.loads in datapackage/helpers.py

In the Python 3.0 changelog (https://docs.python.org/3/whatsnew/3.0.html) it says:
There is a platform-dependent default encoding, which on Unixy platforms can be set with the LANG environment variable (and sometimes also with some other platform-specific locale-related environment variables). In many cases, but not all, the system default is UTF-8; you should never count on this default

Which surprised me, I thought that it was UTF-8 in all the platforms.

To work around the problem for now without changing Datapackage we've done successfully:
https://stackoverflow.com/a/61570285/9294284

commented

Hi @cpina,

Thanks a lot for your investigation. I'll be on this one - #266 - soon and it should prevent/resolve all similar problems

commented

MERGED into frictionlessdata/frictionless-py#388 as we're going to provide first-class support for Mac/Windows in Frictionless.

There are some doubts that we have enough resources to replicate the effort to the datapackage library. But if it's really needed exactly for datapackage please re-open the issue 👍