100% device utilization
qo4on opened this issue · comments
You showed a screenshot with near 100% GPU utilization in your article:
Can you share a code that you used to get this? I've read all your guides and tutorials but did not find any end to end example. The best result I managed to get was around 30% TPU load using cache()
and prefetch()
.
Hi,
Unfortunately, I don't have the code that resulted in this trace. And even I do, it wouldn't be very helpful because tuning needs to be done for your own model. Have you looked at:
https://www.tensorflow.org/guide/gpu_performance_analysis
Hi, thanks for your answer.
Unfortunately, this guide is written in an abstract way without any concrete end-to-end example.
tuning needs to be done for your own model
There are a lot of methods for performance tuning especially for distributed training on TPU and GPU and it's not clear which of them I should use for the model I have. Some notebooks of well optimized models would be much more helpful than all these "talk about" style articles. I do not talk about this particular trace. I'm looking for the code of any trace with near 100% TPU utilization.