Increasing granularity of trace
mossjacob opened this issue · comments
Here I have part of the trace of my model which contains some custom MCMC samplers along with a No U-Turn Sampler from TFP. I'm trying to diagnose why running on the GPU is taking so much longer than on the CPU. I'm wondering if there's a way of getting more precise information about what is being processed. In the image the longest blocks don't give any information about what's specifically going that makes them take so long.
Here is the full log: 20200527-152907.zip
Furthermore when I run the same code just on the CPU, the trace no longer shows the mcmc_sample_chain blocks:
Why is there such a big difference between the GPU and CPU trace? Can I get more specific information for the GPU trace?
Thanks @ckluk , but is there a way to see more information for the blocks after around 25 seconds? Because beyond that point the trace doesn't give any information about what's going on