Support UNICODE
phaedron opened this issue · comments
I have a JSON file in UTF-8. Genson seems to choke on UNICODE encoded files.
I'll try to reproduce. Can you give me more information? It's best if you can include the file and tell me which version of Python you're using.
argparse seems to be using default open() which uses locale.getpreferredencoding() to determine system's locale encoding. In my case it was cp1252 and json is UTF-8 with Cyrillic symbols in it.
I guess it's either need to force to open files in utf8 encoding (argparse.FileType('r', encoding='utf8')
) or to add an option to specify encoding as argument.
You can reproduce it with following JSON:
{"test": "тест"}
attached is a json file from an SAP HANA system which uses a UNICODE code page.
calc_scenario.json.txt
@phaedron did you close this intentionally? It seems to still be an issue.
I don't remember closing this. If you have cycles to address it, seems like it would be useful as UNICODE support would be needed.
For Windows users having this issue (as Windows' default locale is cp1252), use the stdin fallback with a PYTHONIOENCODING override.
Example using git bash:
export PYTHONIOENCODING=UTF-8; cat myobj1.json myobj2.json | genson
It seems like the best way to solve this problem is just to add an encoding option to the executable. How does that sound?
sounds fine. Allows the user to explicitly specify what the encoding should be.
Will it be sufficient if the option applies to all listed files? I feel like it could make the interface much more complex to do otherwise. If you need to merge files with multiple different encodings, you can either convert the files beforehand or just write a Python script for it.
That would be my expectation @wolverdude. I can't imagine using differently-encoded JSON files for the same schema.
I've updated the code, and it should be fixed in the next release by adding an --encoding
option to the CLI tool.
However, it will only be fixed for Python 3 users as Python 2 doesn't allow setting encoding on argparse.FileType
objects. If you have installed it as a Python 2 package, you will not see the --encoding
option or be able to use it unless you upgrade to Python 3.
@ololoe Are you experiencing this issue? If so, could you include some information to reproduce it?