fastai / fastcore

Python supercharged for the fastai library

Home Page:http://fastcore.fast.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UnicodeDecodeError occurs when reading `settings.ini` file containing CJK characters on Windows, due to missing encoding parameter

YIsoda opened this issue · comments

When I tried to preview/test/prepare an nbdev project with settings.ini (UTF-8 encorded) which containing some CKJ (or maybe other non-ascii) characters, an error such as UnicodeDecodeError: 'cp932' codec can't decode byte 0x82 in position 725: illegal multibyte sequence ocurred.

Example of settings and full error message

When a setting file containing a line like

description = サンプル プロジェクト (sample project)

and nbdev_* command executed, output is like below:

$ nbdev_preview.exe
Traceback (most recent call last):
  File "C:\Users\<user_home>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\<user_home>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "<path_to_venv>\Scripts\nbdev_preview.exe\__main__.py", line 7, in <module>
  File "<path_to_venv>\lib\site-packages\fastcore\script.py", line 119, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "<path_to_venv>\lib\site-packages\nbdev\quarto.py", line 278, in preview
    nbdev_quarto.__wrapped__(path, preview=True, **kwargs)
  File "<path_to_venv>\lib\site-packages\nbdev\quarto.py", line 256, in nbdev_quarto
    nbdev.doclinks._build_modidx(skip_exists=True)
  File "<path_to_venv>\lib\site-packages\nbdev\doclinks.py", line 74, in _build_modidx
    if dest is None: dest = get_config().lib_path
  File "<path_to_venv>\lib\site-packages\nbdev\config.py", line 199, in get_config
    cfg = Config(cfg_file.parent, cfg_file.name, extra_files=extra_files, types=_types)
  File "<path_to_venv>\lib\site-packages\fastcore\foundation.py", line 258, in __init__
    found = [Path(o) for o in self._cfg.read(L(extra_files)+[self.config_file])]#, encoding='utf-8')]
  File "C:\Users\<user_home>\AppData\Local\Programs\Python\Python310\lib\configparser.py", line 698, in read
    self._read(fp, filename)
  File "C:\Users\<user_home>\AppData\Local\Programs\Python\Python310\lib\configparser.py", line 1021, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'cp932' codec can't decode byte 0x82 in position 725: illegal multibyte sequence

Version info:
Operating system: Windows 11 Pro (Japanese)
Python 3.10.6
nbdev 2.1.7

This error is likely caused due to no encoding being specified here:

found = [Path(o) for o in self._cfg.read(L(extra_files)+[self.config_file])]

The error seems to be resolved by adding encoding='utf-8' to the argument of the ConfigParser.read() method.

Thanks for the great issue write-up :D you practically fixed it for us! It should work in latest master