tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

is step-time graph designed to only work with GPU/TPU?

burgerkingeater opened this issue · comments

I found the step time graph is not working with my CPU-only job. It says:
No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.

I am having the same issue, @burgerkingeater did you find a way to fix this?

try

from tensorflow.python.profiler.traceme import TraceMe
with traceme.TraceMe("TraceContext", graph_type="train", step_num=step):
train_step(train_iterator)
step = step + 1

I also can't get Tensorboard to identify steps using the Trace class as described in the documentation (trace.py:73):

with tf.profiler.experimental.Trace("Train", step_num=step):
    train_fn()

But @trisolaran's method works! However, TraceMe is the same as Trace (see traceme.py:21). So you can use the normal api:

with tf.profiler.experimental.Trace("TraceContext", graph_type="train", step_num=step):
    train_fn()

The key addition is, interestingly, using "TraceContext" as the name, otherwise it does not work. Seems to be a bug in Tensorflow (tensorflow/tensorflow#50440).

TraceMe is used to add generic message to a time span. So the trace string is arbitrary, and it is usage are not limited to mark step boundary.
the step logic happen to look for "TraceContext" as root of logically connected events.
In another word, this is by design, not a bug.