The UTF-8 script code without repository is saved with wrong UTF-8 characters in Windows
ae-ae opened this issue · comments
Describe the bug
If you run UTF-8 code without a repository, it will be saved at ClearML Server with wrong UTF-8 characters. The issue is caused by the default encoding of the open() function. The open() function uses the encoding returned by the locale.getencoding() since 3.11 and locale.getpreferredencoding() before 3.11.
Both of these functions return ANSI or LC_CTYPE encoding, but not UTF-8. Reading UTF-8 characters with ANSI or any other encoding results in wrong representation of UTF-8 characters. In some cases it may also cause the charmap exception and no script code will be saved at all.
See #443 for the linked issue.
To reproduce
- Run the next script with UTF-8 chars äöüÄÜÖß:
from clearml import Task
# UTF-8 chars here: äöüÄÜÖß.
task = Task.init(project_name="Bug report", task_name="äöüÄÜÖß", output_uri=True)
- Open the task in ClearML Web.
- Invalid characters are displayed in the UNCOMMITTED CHANGES (depends on LC_CTYPE encoding).
Expected behaviour
The correct UTF-8 characters should be displayed in the UNCOMMITTED CHANGES
Environment
Windows