tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No trace viewer tab after profiling an inference request

haitong opened this issue · comments

Hi,

I am trying to profile an inference request following a guide here: https://www.tensorflow.org/tfx/serving/tensorboard

My problem is that after profiling an inference request, I only see 4 tabs: "overview_page", "input_pipeline_analyzer" "kernel_stats" and "tensorflow_stats". There is no "trace viewer" tab.

I am running a container in a k8s pod. The container runs both tfserving and tensorboard. The versions I am using for tfserving:
1: tensorflow/serving:2.2.0-gpu: https://hub.docker.com/layers/tensorflow/serving/2.2.0-gpu/images/sha256-29960df16a51f624b9f356eae801adfa303222656894ba92442ea28428272b47?context=explore

The versions I am using for tensorboard:
1: tensorflow 2.2.0
2: tensorboard 2.2.2
3: tensorboard_plugin_profile 2.2.0

See how I build the docker image if it is helpful:

FROM tensorflow/serving:2.2.0-gpu
RUN apt-get update && apt-get install -y python3 python3-pip
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install tensorflow==2.2.0
RUN python3 -m pip install tensorboard==2.2.2
RUN python3 -m pip install tensorboard_plugin_profile==2.2.0

After I started a profiling session, the server side log looks normal and it says it generated localhost.trace.json.gz

Starting to trace for 3000 ms. Remaining attempt(s): 3
2020-06-13 01:22:26.569542: I external/org_tensorflow/tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
2020-06-13 01:22:29.789779: I external/org_tensorflow/tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1479] CUPTI activity buffer flushed
2020-06-13 01:22:29.791411: I external/org_tensorflow/tensorflow/core/profiler/internal/gpu/device_tracer.cc:216] GpuTracer has collected 39220 callback api events and 39220 activity events.
2020-06-13 01:22:31.127150: I external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: logdir/plugins/profile/2020_06_13_01_22_26
2020-06-13 01:22:31.873395: I external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz
2020-06-13 01:22:32.288685: I external/org_tensorflow/tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 9.015 ms

However, when I looked at the folder where the profiling data is generated, I only see these files:

$ ls
localhost.input_pipeline.pb localhost.kernel_stats.pb localhost.overview_page.pb localhost.tensorflow_stats.pb localhost.xplane.pb

There is no localhost.trace.json.gz, and thus I cannot see a trace viewer tab in my tensorboard. Any ideas how I can get localhost.trace.json.gz file?

Screen Shot 2020-06-15 at 10 02 27 AM

commented

From this log: "external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz", I think the localhost.trace.json.gz is dumped in your serving worker. Can you check the serving worker and see if the file is in the path, and copy that to the TensorBoard logdir?

The reason for the differences is gzip files can only be saved locally, therefore not sending back via gRPC as other tools.

From this log: "external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz", I think the localhost.trace.json.gz is dumped in your serving worker. Can you check the serving worker and see if the file is in the path, and copy that to the TensorBoard logdir?

The reason for the differences is gzip files can only be saved locally, therefore not sending back via gRPC as other tools.

Wow Thanks @qiuminxu ! You saved my day. You are right. The file is indeed generated in tfserving log folder. I need to manually copy it to tensorboard log folder and now I can use trace viewer tab tensorboard.

I suggest we added this gotcha to https://www.tensorflow.org/tfx/serving/tensorboard so other people are aware of this too.