ReScience / template

Template for article submission

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

utf8 in metadata and non-utf8 in terminal

broukema opened this issue · comments

BUG DESCRIPTION:
If the user enters non-ascii metadata in metadata.yaml in UTF-8,
e.g.

    name:    Institute of Ąćçęńted Letterś in Półiśh and Freńćh
    address: Żyźszczyń

but in the terminal s/he has a non-UTF-8 locale setting such as
LANG=fr_FR.ISO-8859-1, then yaml-to-latex.py will yield an
error and 'make' will fail.

REPRODUCE:

git clone https://github.com/ReScience/template.git
cd template
git checkout 042d821

Edit metadata.yaml while your locale (or at least the locale of your editor)
is in utf-8 mode and replace some existing content by some
non-ascii UTF-8 content such as

    name:    Institute of Ąćçęńted Letterś in Półiśh and Freńćh
    address: Żyźszczyń

Set a non-UTF-8 locale that you have installed and try make:

export LANG=fr_FR.ISO-8859-1
make

EXPECTED RESULT:

Traceback (most recent call last):
  File "./yaml-to-latex.py", line 85, in <module>
    article = Article(file.read())
  File "/scratch/template/article.py", line 135, in __init__
    self.parse(data)
  File "/scratch/template/article.py", line 170, in parse
    document = yaml.load(data)
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 70, in load
    loader = Loader(stream)
  File "/usr/lib/python3/dist-packages/yaml/loader.py", line 34, in __init__
    Reader.__init__(self, stream)
  File "/usr/lib/python3/dist-packages/yaml/reader.py", line 74, in __init__
    self.check_printable(stream)
  File "/usr/lib/python3/dist-packages/yaml/reader.py", line 144, in check_printable
    'unicode', "special characters are not allowed")
yaml.reader.ReaderError: unacceptable character #x0084: special characters are not allowed
  in "<unicode string>", position 1529
Makefile:18�: la recette pour la cible ��metadata.tex�� achoue
make: *** [metadata.tex] Erreur 1

PROPOSED FIX:
Commit a94e676 in branch debug_utf8_py of my draft article sets
the locale, within the python process, to en-US.UTF-8. For me this
solves the bug - yaml-to-latex.py runs without error, and the locale
in the terminal is unchanged afterwards.

FURTHER FIXES:
Further improvements could be to check whether the existing locale
is UTF-8 and to warn the user if it is not.

COMMENT:
A blog discussion of how python3 handles some things in UTF-8 but not everything:
https://katcipis.github.io/blog/python3-utf8-locale/

Thanks @broukema for finding and fixing this bug!
Fixed by #5.