tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get multi-worker profile data

delucca opened this issue · comments

Hi

I've been trying to user Tensorflow profiler to fetch the step_time of my workers while using the ParametersServerStrategy. I've followed both the colab and Tensorflow documentations, but I'm not being able to find a way to do so.

I've tried the following:

  1. I tried to use the Tensorboard callback, by adding the proper profiling config to it and passing to my Keras model fit call. It works, but it gathers no data to profile (even after increasing the batch profiling window
  2. I tried to start a single server for each worker, them I tried to call the trace on those servers from my coordinator node.

My second attempt was the one I think I was able to get close to a solution. But if I call the trace method after the model.fit call, it starts the profiler after training, them it also gets nothing. If I launch before the model.fit call, it freezes my script waiting for the worker's profilers to return, and also get nothing.

How can I do that?

multiple workers only supported in sampling mode (i.e. you specify it in the capture dialog in tensorboard profile plug-in, specify more than one host, separated by comma).