is step-time graph designed to only work with GPU/TPU?
burgerkingeater opened this issue · comments
I found the step time graph is not working with my CPU-only job. It says:
No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.
I am having the same issue, @burgerkingeater did you find a way to fix this?
@Jasperhino it's fixed by tensorflow/estimator#68
try
from tensorflow.python.profiler.traceme import TraceMe
with traceme.TraceMe("TraceContext", graph_type="train", step_num=step):
train_step(train_iterator)
step = step + 1
I also can't get Tensorboard to identify steps using the Trace
class as described in the documentation (trace.py:73
):
with tf.profiler.experimental.Trace("Train", step_num=step):
train_fn()
But @trisolaran's method works! However, TraceMe
is the same as Trace
(see traceme.py:21
). So you can use the normal api:
with tf.profiler.experimental.Trace("TraceContext", graph_type="train", step_num=step):
train_fn()
The key addition is, interestingly, using "TraceContext" as the name, otherwise it does not work. Seems to be a bug in Tensorflow (tensorflow/tensorflow#50440).
TraceMe is used to add generic message to a time span. So the trace string is arbitrary, and it is usage are not limited to mark step boundary.
the step logic happen to look for "TraceContext" as root of logically connected events.
In another word, this is by design, not a bug.