tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Increasing granularity of trace

mossjacob opened this issue · comments

trace
trace1

Here I have part of the trace of my model which contains some custom MCMC samplers along with a No U-Turn Sampler from TFP. I'm trying to diagnose why running on the GPU is taking so much longer than on the CPU. I'm wondering if there's a way of getting more precise information about what is being processed. In the image the longest blocks don't give any information about what's specifically going that makes them take so long.

Here is the full log: 20200527-152907.zip

Furthermore when I run the same code just on the CPU, the trace no longer shows the mcmc_sample_chain blocks:
trace2

Why is there such a big difference between the GPU and CPU trace? Can I get more specific information for the GPU trace?

commented

Thanks @ckluk , but is there a way to see more information for the blocks after around 25 seconds? Because beyond that point the trace doesn't give any information about what's going on

commented

@ckluk how does one "fuse" multiple kernels together? Especially when using something like keras?