allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot use offline mode

michelkok opened this issue · comments

Describe the bug

I cannot train with offline mode as it errors out with ValueError: Unsupported keyword arguments: force. When not using offline mode, the training starts just fine.

Stacktrace
  C:\Users\user\anaconda3\envs\conda_wrapper\python.exe C:\Users\user\Documents\GitHub\projects\model_update\train.py --multirun 
  [I 2024-02-27 15:46:29,606] Using an existing study with name 'debug' instead of creating a new one.
  [2024-02-27 15:46:29,610][HYDRA] Study name: debug
  [2024-02-27 15:46:29,610][HYDRA] Storage: sqlite:///C:/Users/user/Documents/Internship_AgroCares/experiments/debug.db
  [2024-02-27 15:46:29,612][HYDRA] Sampler: TPESampler
  [2024-02-27 15:46:29,612][HYDRA] Directions: ['minimize']
  [2024-02-27 15:46:29,724][HYDRA] Launching 1 jobs locally
  [2024-02-27 15:46:29,724][HYDRA]        #0 : learning_rate=0.0009646166816485542 conv_dilation=4 conv_kernel_size=4 conv_filters_0=80 conv_filters_1=72 fc_neurons_0=512 fc_neurons_1=64 fc_neurons_2=512 fc_l2=3.3171161072480045e-05 batch_size=256 activation=relu pooling=avg pooling_size=2
  ClearML Task: created new task id=offline-07e2611c2a684673926cf42cb3a03b51
  Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
  ng_size=2']
  Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
  ng_size=2']
  Traceback (most recent call last):
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 213, in run_and_report
      return func()
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 461, in <lambda>
      lambda: hydra.multirun(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\hydra.py", line 162, in multirun
      ret = sweeper.sweep(arguments=task_overrides)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\optuna_sweeper.py", line 52, in sweep
      return self.sweeper.sweep(arguments)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 391, in sweep
      raise e
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 360, in sweep
      f"Return value must be float-castable. Got '{ret.return_value}'."
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
      raise self._return_value
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 357, in sweep
      values = [float(ret.return_value)]
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
      raise self._return_value
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 186, in run_job
      ret.return_value = task_function(task_cfg)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 230, in _patched_task_function
      return task_function(a_config, *a_args, **a_kwargs)
    File "C:\Users\user\Documents\GitHub\projects\model_update\train.py", line 66, in main
      task = Task.init(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\task.py", line 765, in init
      PatchHydra.delete_overrides()
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 53, in delete_overrides
      cls._current_task.delete_parameter(cls._overrides_section, force=True)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_interface\task\task.py", line 1365, in delete_parameter
      res = self.send(tasks.DeleteHyperParamsRequest(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\services\v2_9\tasks.py", line 3814, in __init__
      super(DeleteHyperParamsRequest, self).__init__(**kwargs)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\session\request.py", line 31, in __init__
      raise ValueError('Unsupported keyword arguments: %s' % ', '.join(kwargs.keys()))
  ValueError: Unsupported keyword arguments: force
  ClearML Task: Offline session stored in C:/Users/user/.clearml/cache/offline/offline-07e2611c2a684673926cf42cb3a03b51.zip

To reproduce

"""Demonstrate how training can be done in a simple fashion."""
from pathlib import Path
from clearml import Task
import hydra
from hydra.core.config_store import ConfigStore
import os

ConfigStore.instance().store(name="base_config", node=TrainConfiguration)

@hydra.main(version_base=None, config_path="conf", config_name="sweep")
def main(cfg: ScriptConfiguration):
    """Run."""
    Task.set_offline(offline_mode=True)

    task = Task.init(
        project_name="Test",
        task_name="debugtask",
        tags=['debug']
    )
    trainer = Trainer(config=cfg)
    train_loss = trainer.train()

    task.close()

    # Set offline to false and upload task to server
    Task.set_offline(False)


if __name__ == "__main__":
    main()

Expected behaviour

It should have trained normally, like when offline mode is not on.

Environment

  • Server type (both self hosted and on app.clear.ml)
  • ClearML SDK Version 1.14.3
  • ClearML Server Version (Only for self hosted). Can be found on the bottom right corner of the settings screen.: 1.14.1-451
  • Python Version 3.10.8
  • OS (Windows \ Linux \ Macos)

Hi @michelkok ! Thank you for reporting. We have identified the problem and we will release a fix for this problem soon.

@eugen-ajechiloae-clearml while waiting for the release, would dropping the force argument in the cls._current_task.delete_parameter function in the PatchHydra class from hydra_bind.py fix the issue?