allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Home Page:https://clear.ml/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hydra overrides integration

MaximeChurin opened this issue · comments

Describe the bug

Since clearml==1.12.2 and especially this commit our hydra config is duplicated due to this line. Since that we are forced to transform any + to ++.
When reading the new commits in master I came across this where it seems you plan to fix this specific issue but due to parenthesis error it is just incorrect (like that you append + in front of any param !)

Why in the first place did you duplicate the config from hydra inside overrides since 1.12.2, is it intentional or a bug?
Then if it fixes some problems when do you plan to make a patch with the fix below?

To reproduce

assert ["+" + (o if o.startswith("+") and not o.startswith("++") else o) for o in ["iin", "+n", "++h"]] == ['iin', '++n', '++h']
assert ["+" + o if (o.startswith("+") and not o.startswith("++")) else o for o in ["iin", "+n", "++h"]] == ['iin', '++n', '++h']

Expected behaviour

clearml/binding/hydra_bind.py#L92

                overrides = ["+" + o if (o.startswith("+") and not o.startswith("++")) else o for o in overrides]

Environment

Server type: self-hosted
ClearML SDK Version: clearml==1.12.2
ClearML Server Version (Only for self hosted): 1.9.2-317
Python Version: 3.8.16
OS (Windows \ Linux \ Macos): Linux

Hi @MaximeChurin, we've just released version v1.13.1 which should address this issue - please see if it works for you

Hi @jkhenning, I just tested it both in single task and pipeline mode and I confirm the fix works, thanks.

Can you help me understand why did you decided to duplicate the config from hydra inside overrides since 1.12.2?

Hi @MaximeChurin, actually, it was not a duplication, but a literal bug...
My apologies it took so long to fix, basically it only effected ++ prefixes and I assume this is why it did not bother many users...

Hi @jkhenning,
Sorry maybe my message was not clear, there is an actual duplication of config coming from this commit.
In the first pass, overrides are filled by overrides = yaml.safe_load then we add the same again because of overrides += ['{}={}'.format(k, v) for k, v in stored_config.items()].
Then all the commits that follow are just patches around this.
I am wondering why you added this += line with the stored_config in the first place?
If in a remote configuration I modify the hydra_bind.py with print:

                try:
                    overrides = yaml.safe_load(full_parameters.get("Args/overrides", "")) or []
                except Exception:
                    overrides = []
                if overrides and not isinstance(overrides, (list, tuple)):
                    overrides = [overrides]
                print(overrides)
                overrides += ['{}={}'.format(k, v) for k, v in stored_config.items()]
                print(overrides)
                overrides = [("+" + o) if (o.startswith("+") and not o.startswith("++")) else o for o in overrides]

I get the following output and the same behavior both in pipeline and single task:

['job=pipeline', 'clearml.project_name=debug', 'clearml.task_name=Pipeline', 'clearml.queue_name=RTX2080', '+dataset=demo_template1', '+num_samples=50', 'epochs=2']
['job=pipeline', 'clearml.project_name=debug', 'clearml.task_name=Pipeline', 'clearml.queue_name=RTX2080', '+dataset=demo_template1', '+num_samples=50', 'epochs=2', 'job=pipeline', 'clearml.project_name=debug', 'clearml.task_name=Pipeline', 'clearml.queue_name=RTX2080', '+dataset=demo_template1', '+num_samples=50', 'epochs=2']

@MaximeChurin I know, but at no point there was an intention to duplicate anything, it was a bug

then instead of this patch line overrides = [("+" + o) if (o.startswith("+") and not o.startswith("++")) else o for o in overrides] can we delete overrides = yaml.safe_load(full_parameters.get("Args/overrides", "")) or [] od do if empty then use stored_config?