HPO cannot access logger.report_single_value metrics
ilouzl opened this issue · comments
Describe the bug
HPO will not include metrics generated by logger.report_single_value
when searching for the desired objective_metric
To reproduce
- Define a base task which logs it's result metric using
task.get_logger().report_single_value(some_value, 'accuracy')
- Define a HPO task that tries to optimize that metric:
...
task = Task.init(project_name="examples", task_name="HP optimizer", task_type=Task.TaskTypes.optimizer)
task.execute_remotely(queue_name="services")
an_optimizer = HyperParameterOptimizer(
base_task_id=...,
hyper_parameters=...
objective_metric_title="Summary",
objective_metric_series="accuracy",
)
...
The HP task will have an error of this form:
Traceback (most recent call last):
File "/home/miniconda3/envs/clearml/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/miniconda3/envs/clearml/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/.clearml/venvs-builds.2.3/3.9/lib/python3.9/site-packages/clearml/automation/optimization.py", line 1997, in report_daemon
self.report_completed_status(completed_jobs, cur_completed_jobs, task_logger, title)
File "/home/.clearml/venvs-builds.2.3/3.9/lib/python3.9/site-packages/clearml/automation/optimization.py", line 2052, in report_completed_status
iteration = [it[0] if it else -1 for it in iteration_value]
TypeError: 'NoneType' object is not iterable
Expected behaviour
the metrics should be discovered and optimized as regular (i.e. tensorboard like) metrics.
Environment
- Server type - app.clear.ml
- ClearML SDK Version - clearml==1.14.4
- Python Version - 3.9.16
- Dockerized worker
Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1709203560313889
Thanks for reporting @ilouzl.
We'll update when a fix is available.
Hey @ilouzl, you seem to have a small issue with the way you report the single value. As per documentation you first need to provide name of the single value, and then its value.
Could you please re-run your code with these changes and report whether the issue still persists? Because I wasn't able to reproduce the problem
Hi @AlexandruBurlacu, its just a typo in the issue description.
The actual implementation is correct - first name and then value.
Can you please provide a full example, because I couldn't reproduce it on my side using this code:
import logging
from clearml import Task
from clearml.automation import (
DiscreteParameterRange,
HyperParameterOptimizer,
RandomSearch
)
aSearchStrategy = RandomSearch
def job_complete_callback(
job_id, # type: str
objective_value, # type: float
objective_iteration, # type: int
job_parameters, # type: dict
top_performance_job_id # type: str
):
print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
if job_id == top_performance_job_id:
print('WOOT WOOT we broke the record! Objective reached {}'.format(objective_value))
task = Task.init(project_name='Hyper-Parameter Optimization',
task_name='Automatic Hyper-Parameter Optimization',
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False)
def objective_function():
import time
task = Task.current_task()
epochs = task.get_parameter("General/epochs", cast=True)
for ep in range(epochs):
task.get_logger().report_scalar(title="epoch_accuracy", series="epoch_accuracy", iteration=ep, value=ep ** 2)
task.get_logger().report_single_value("Final value", ep ** 2)
time.sleep(1)
return ep ** 2
objective_task = task.create_function_task(objective_function)
# experiment template to optimize in the hyper-parameter optimization
print(">>>>>>>", objective_task.id)
args = {
'template_task_id': objective_task.id}
args = task.connect(args)
execution_queue = 'queue-7'
an_optimizer = HyperParameterOptimizer(
base_task_id=args['template_task_id'],
hyper_parameters=[
DiscreteParameterRange('General/epochs', values=list(range(10, 30))),
],
objective_metric_title='Summary',
objective_metric_series='Final value',
objective_metric_sign='max',
max_number_of_concurrent_tasks=3,
optimizer_class=aSearchStrategy,
execution_queue=execution_queue,
spawn_project=None,
time_limit_per_job=10.,
pool_period_min=0.2,
total_max_jobs=10,
max_iteration_per_job=30,)
an_optimizer.start(job_complete_callback=job_complete_callback)
an_optimizer.set_time_limit(in_minutes=120.0)
an_optimizer.wait()
top_exp = an_optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])
Well @AlexandruBurlacu your code does work for me.
I probably had a different error, but it seems to be just fine now.
Thanks!