utf8 in metadata and non-utf8 in terminal
broukema opened this issue · comments
BUG DESCRIPTION:
If the user enters non-ascii metadata in metadata.yaml in UTF-8,
e.g.
name: Institute of Ąćçęńted Letterś in Półiśh and Freńćh
address: Żyźszczyń
but in the terminal s/he has a non-UTF-8 locale setting such as
LANG=fr_FR.ISO-8859-1, then yaml-to-latex.py will yield an
error and 'make' will fail.
REPRODUCE:
git clone https://github.com/ReScience/template.git
cd template
git checkout 042d821
Edit metadata.yaml
while your locale (or at least the locale of your editor)
is in utf-8 mode and replace some existing content by some
non-ascii UTF-8 content such as
name: Institute of Ąćçęńted Letterś in Półiśh and Freńćh
address: Żyźszczyń
Set a non-UTF-8 locale that you have installed and try make
:
export LANG=fr_FR.ISO-8859-1
make
EXPECTED RESULT:
Traceback (most recent call last):
File "./yaml-to-latex.py", line 85, in <module>
article = Article(file.read())
File "/scratch/template/article.py", line 135, in __init__
self.parse(data)
File "/scratch/template/article.py", line 170, in parse
document = yaml.load(data)
File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 70, in load
loader = Loader(stream)
File "/usr/lib/python3/dist-packages/yaml/loader.py", line 34, in __init__
Reader.__init__(self, stream)
File "/usr/lib/python3/dist-packages/yaml/reader.py", line 74, in __init__
self.check_printable(stream)
File "/usr/lib/python3/dist-packages/yaml/reader.py", line 144, in check_printable
'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x0084: special characters are not allowed
in "<unicode string>", position 1529
Makefile:18�: la recette pour la cible ��metadata.tex�� a �chou�e
make: *** [metadata.tex] Erreur 1
PROPOSED FIX:
Commit a94e676 in branch debug_utf8_py
of my draft article sets
the locale, within the python process, to en-US.UTF-8. For me this
solves the bug - yaml-to-latex.py runs without error, and the locale
in the terminal is unchanged afterwards.
FURTHER FIXES:
Further improvements could be to check whether the existing locale
is UTF-8 and to warn the user if it is not.
COMMENT:
A blog discussion of how python3 handles some things in UTF-8 but not everything:
https://katcipis.github.io/blog/python3-utf8-locale/