No trace viewer tab after profiling an inference request
haitong opened this issue · comments
Hi,
I am trying to profile an inference request following a guide here: https://www.tensorflow.org/tfx/serving/tensorboard
My problem is that after profiling an inference request, I only see 4 tabs: "overview_page", "input_pipeline_analyzer" "kernel_stats" and "tensorflow_stats". There is no "trace viewer" tab.
I am running a container in a k8s pod. The container runs both tfserving and tensorboard. The versions I am using for tfserving:
1: tensorflow/serving:2.2.0-gpu: https://hub.docker.com/layers/tensorflow/serving/2.2.0-gpu/images/sha256-29960df16a51f624b9f356eae801adfa303222656894ba92442ea28428272b47?context=explore
The versions I am using for tensorboard:
1: tensorflow 2.2.0
2: tensorboard 2.2.2
3: tensorboard_plugin_profile 2.2.0
See how I build the docker image if it is helpful:
FROM tensorflow/serving:2.2.0-gpu
RUN apt-get update && apt-get install -y python3 python3-pip
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install tensorflow==2.2.0
RUN python3 -m pip install tensorboard==2.2.2
RUN python3 -m pip install tensorboard_plugin_profile==2.2.0
After I started a profiling session, the server side log looks normal and it says it generated localhost.trace.json.gz
Starting to trace for 3000 ms. Remaining attempt(s): 3
2020-06-13 01:22:26.569542: I external/org_tensorflow/tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
2020-06-13 01:22:29.789779: I external/org_tensorflow/tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1479] CUPTI activity buffer flushed
2020-06-13 01:22:29.791411: I external/org_tensorflow/tensorflow/core/profiler/internal/gpu/device_tracer.cc:216] GpuTracer has collected 39220 callback api events and 39220 activity events.
2020-06-13 01:22:31.127150: I external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: logdir/plugins/profile/2020_06_13_01_22_26
2020-06-13 01:22:31.873395: I external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz
2020-06-13 01:22:32.288685: I external/org_tensorflow/tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 9.015 ms
However, when I looked at the folder where the profiling data is generated, I only see these files:
$ ls
localhost.input_pipeline.pb localhost.kernel_stats.pb localhost.overview_page.pb localhost.tensorflow_stats.pb localhost.xplane.pb
There is no localhost.trace.json.gz
, and thus I cannot see a trace viewer
tab in my tensorboard. Any ideas how I can get localhost.trace.json.gz
file?
From this log: "external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz", I think the localhost.trace.json.gz is dumped in your serving worker. Can you check the serving worker and see if the file is in the path, and copy that to the TensorBoard logdir?
The reason for the differences is gzip files can only be saved locally, therefore not sending back via gRPC as other tools.
From this log: "external/org_tensorflow/tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2020_06_13_01_22_26/localhost.trace.json.gz", I think the localhost.trace.json.gz is dumped in your serving worker. Can you check the serving worker and see if the file is in the path, and copy that to the TensorBoard logdir?
The reason for the differences is gzip files can only be saved locally, therefore not sending back via gRPC as other tools.
Wow Thanks @qiuminxu ! You saved my day. You are right. The file is indeed generated in tfserving log folder. I need to manually copy it to tensorboard log folder and now I can use trace viewer
tab tensorboard.
I suggest we added this gotcha to https://www.tensorflow.org/tfx/serving/tensorboard so other people are aware of this too.