pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Home Page:https://pytorch.org/xla

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Checking TPU Type

mfatih7 opened this issue · comments

Hello

While a training script is being started on TPU via COLAB I observe the warnings on the console

WARNING:root:Waiting for TPU to be start up with version pytorch-1.13...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.13...
WARNING:root:TPU has started up successfully with version pytorch-1.13

But the type of the TPU(v2, v3 or v4) is not shown on the console.

I tried to use

torch_xla.core.xla_model.xla_device_hw(device)

But it only reflects that the device is TPU.
It does not show the type of device.
How can I know the type of device in Pytorch?

In Tensorflow I found this

I don't think colab has v4, so if you use

def get_memory_info(device):

8GB HBM means it is a v2 and 16GB HBM means it is a v3

Thanks @JackCaoG

OK, here is the code to check the hardware

print( 'Device is ' + xm.xla_device_hw(device) + ' with ' + str(xm.get_memory_info(device)['kb_free']) + ' KB free memory ' + str(xm.get_memory_info(device)['kb_total']) + ' KB total memory ' )

I see that COLAB gives TPU v2 to me.
When the model is not loaded to TPU the output is

Device is TPU with 8370176 KB free memory 8370176 KB total memory

After training of 1st epoch is finished the output is

Device is TPU with 7743568 KB free memory 8370176 KB total memory

After validation of 1st epoch is finished the output is

Device is TPU with 7725904 KB free memory 8370176 KB total memory

To which core does this information belong?
I mean there are 8 cores on TPUv2 In my code I am using a single core in this experiment.

I think you should consider adding a function returning the TPU version also.

What about TPU v4?
Should I use google cloud instead of google COLAB?
What do you suggest?
I need larger and faster TPUs for my studies.
That is why I am trying to run my scripts on TPU.

If you want to use TPUv4, please use that through the Google Cloud. Colab is under the old 2vm architeure which you will have a not capable host cpuvm, under new TPUVM you will get a really power host vm. Checkout https://cloud.google.com/tpu/docs/pytorch-xla-ug-tpu-vm.

for v4, checkout https://cloud.google.com/tpu/docs/v4-users-guide