tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TensorBoard won't load Profiler generated event file on Colab

NicholasMcElroy opened this issue · comments

Hello,

I'm attempting to use Profiler on Google Colab to get the memory usage of the tensors in my graph, but after I run it try to open the generated files with TensorBoard it gives me the error that it can't find the event file for it. I know I'm pointing it in the right folder as when I generate the graph layout using tf.summary.FileWriter, that event file opens without any issues. I'm not sure what's going wrong with the process, I just can't seem to get it to work. I'm using the tf.profiler.experimental start/stop method if that matters, and I've included the output that I get when running it in case it's showing something that's missing that I can't see.

2021-06-01 21:27:12.853015: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-06-01 21:27:12.853069: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-06-01 21:27:12.853126: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1611] Profiler found 1 GPUs
2021-06-01 21:27:12.903125: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcupti.so.11.0
2021-06-01 21:27:13.508665: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-01 21:27:16.043887: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-01 21:27:16.608931: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2021-06-01 21:27:16.610770: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1743] CUPTI activity buffer flushed
2021-06-01 21:27:16.765792: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673] GpuTracer has collected 2194 callback api events and 2307 activity events.
2021-06-01 21:27:16.802284: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
2021-06-01 21:27:16.862308: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logdir/plugins/profile/2021_06_01_21_27_16
2021-06-01 21:27:16.909059: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.trace.json.gz
2021-06-01 21:27:16.982160: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logdir/plugins/profile/2021_06_01_21_27_16
2021-06-01 21:27:16.990529: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for memory_profile.json.gz to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.memory_profile.json.gz
2021-06-01 21:27:16.993887: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logdir/plugins/profile/2021_06_01_21_27_16Dumped tool data for xplane.pb to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.xplane.pb
Dumped tool data for overview_page.pb to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.overview_page.pb
Dumped tool data for input_pipeline.pb to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.kernel_stats.pb

Hey @yisitu thanks for checking out my issue, just wanted to write and see if you needed any further information from me on the issue or what I can do to help to debug this

commented

I was wondering since this is on Colab, are you able to share the reproducible with me?

Here are other things to check for:

  1. This is the file that contains the profile data: logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.xplane.pb. Check that it exists.
  2. The next step is to check that TensorBoard is able to see and load the xplane.pb file. That's the file that contains your profiling data. Was --logdir provided with the right path? For example:
/tmp/logdir/plugins/profile/2021_06_01_21_27_16/24d5ba506286.xplane.pb
^^^^^^^^^^^

# In colab:
%tensorboard --logdir=/tmp/logdir

https://colab.research.google.com/drive/1XrC6CfjEyqTaP7vpeqBu9cGZZ8UXook_

Here's the link to the Colab notebook with the configuration I'm using. When I check in the folders that has the log data I'm able to find the xplane.pb file, and I'm fairly certain that TensorBoard has the right path and can read from it, as when I use a setup that has a tf.summary.FileWriter I'm able to load the event file that's generated in the same directory.

e3CobHh 1
This is what the logdir directory looks like after running, I'm not sure if the profile-empty file is what's causing the issue or how I resolve it if it is.

commented

I can't open the colab for some reason. Could have something to do with permissions?

Just wanted to check and see if there was any insight on the issue. Would it be easier to try and resolve this issue over a video call?

commented

Does this workaround on another Github issue improve things?

tensorflow/tensorboard#5088 (comment)

It doesn't seem to change anything, tensorboard still gives me the error "No dashboards are active for the current data set."

Hello, just wanted to reach out again and see if there was any updates. I still haven't been able to load data from the profiler on TensorBoard, still getting the "No dashboards are active" error.

So I ended up fixing the issue by complete accident just right now. Bizarrely enough, when you run %tensorboard --logdir='etcetc' initially, it won't load. However, if you first run %tensorboard --inspect --logdir='etcetc' first, and then run %tensorboard --logdir='etcetc', it will load the profile data. It seems like if you run TensorBoard without inspecting first it won't recognize the data as being there, but will recognize it if you inspect first and then run it.

commented

That's odd. What about hitting refresh (top right circular arrow button IIRC) instead? Sorry about not being responsive - there are other pressing issues at the same time that are not visible on Gitthub.

It's no problem at all, I understand. I had tried using the refresh button before and it didn't seem to have any effect. Not entirely sure why inspecting it before running works but I'll take it

I have met the some problems, any solutions now?

I also met the same problems.