allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The UTF-8 script code without repository is saved with wrong UTF-8 characters in Windows

ae-ae opened this issue · comments

Describe the bug

If you run UTF-8 code without a repository, it will be saved at ClearML Server with wrong UTF-8 characters. The issue is caused by the default encoding of the open() function. The open() function uses the encoding returned by the locale.getencoding() since 3.11 and locale.getpreferredencoding() before 3.11.
Both of these functions return ANSI or LC_CTYPE encoding, but not UTF-8. Reading UTF-8 characters with ANSI or any other encoding results in wrong representation of UTF-8 characters. In some cases it may also cause the charmap exception and no script code will be saved at all.
See #443 for the linked issue.

To reproduce

  1. Run the next script with UTF-8 chars äöüÄÜÖß:
from clearml import Task
# UTF-8 chars here: äöüÄÜÖß.
task = Task.init(project_name="Bug report", task_name="äöüÄÜÖß", output_uri=True)
  1. Open the task in ClearML Web.
  2. Invalid characters are displayed in the UNCOMMITTED CHANGES (depends on LC_CTYPE encoding).

Expected behaviour

The correct UTF-8 characters should be displayed in the UNCOMMITTED CHANGES

Environment

Windows