100% device utilization

Question

100% device utilization

qo4on opened this issue 3 years ago · comments

You showed a screenshot with near 100% GPU utilization in your article:

Can you share a code that you used to get this? I've read all your guides and tutorials but did not find any end to end example. The best result I managed to get was around 30% TPU load using cache() and prefetch().

ckluk-github · Answer 1 · Sat Jan 30 2021 09:48:44 GMT+0800 (China Standard Time)

Hi,

Unfortunately, I don't have the code that resulted in this trace. And even I do, it wouldn't be very helpful because tuning needs to be done for your own model. Have you looked at:
https://www.tensorflow.org/guide/gpu_performance_analysis

qo4on · Answer 2 · Sat Jan 30 2021 16:20:07 GMT+0800 (China Standard Time)

Hi, thanks for your answer.

Unfortunately, this guide is written in an abstract way without any concrete end-to-end example.

tuning needs to be done for your own model

There are a lot of methods for performance tuning especially for distributed training on TPU and GPU and it's not clear which of them I should use for the model I have. Some notebooks of well optimized models would be much more helpful than all these "talk about" style articles. I do not talk about this particular trace. I'm looking for the code of any trace with near 100% TPU utilization.

ckluk · Answer 3 · Tue Feb 02 2021 02:18:53 GMT+0800 (China Standard Time)

Perhaps, you can take a look at the models in: https://github.com/tensorflow/models They should be relatively well tuned.

…

On Sat, Jan 30, 2021 at 12:20 AM qo4on ***@***.***> wrote: Hi, thanks for your answer. Unfortunately, this guide is written in an abstract way without any concrete end-to-end example. tuning needs to be done for your own model There are a lot of methods for performance tuning especially for distributed training on TPU and GPU and it's not clear which of them I should use for the model I have. Some notebooks of well optimized models would be much more helpful than all these "talk about" style articles. I do not talk about this particular trace. I'm looking for the code of any trace with near 100% TPU utilization. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE33L3KAMLAFAU2ZOFAEG2LS4O6MHANCNFSM4WTRKU6A> .