allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

huggingface trainer hook calls task.close() prematurely

nkgrush opened this issue · comments

Describe the bug

Huggingface Trainer class is integrated with clearml. When trainer.train() finishes (successfully), the trainer calls task.close(), making original clearml task unavailable. I am refering to this line specifically (permalink).

To reproduce

task = Task.init(
    project_name='project',
    task_name='task',
)
...
model = ...
dataset = ...
...
from transformers import Trainer
trainer_args = ...
trainer = SFTTrainer(
    model,
    train_dataset=dataset,
    args=trainer_args,
)

print(task.status) # Running
trainer.train()
print(task.status) # Completed

# now the task object is dead for the most purposes

Expected behaviour

The main task should not be closed (making it unavailable) after the training is finished. This is especially important if there are multiple trainer runs or any custom actions are taken after training.

Environment

Independent

Hi @nkgrush ! We have submitted a PR to huggingface related to this issue: huggingface/transformers#26614