Train on GPU instead of TPU - differnt distribution strategies

Question

Train on GPU instead of TPU - differnt distribution strategies

PhilipMay opened this issue 3 years ago · comments

Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.

Thanks
Philip

Philip May · Answer 1 · Sat Apr 03 2021 14:03:22 GMT+0800 (China Standard Time)

Well - on the README you write:

The code is tested on a V100 GPU.

This means the pretraining on multiple GPUs - right?

Jiang Zihang · Answer 2 · Wed Apr 07 2021 00:25:38 GMT+0800 (China Standard Time)

Hi, thanks for your interest.
Our code is only tested on a single V100 GPU. If you are seeking support for multi-GPU instead of TPU training, you may refer to https://huggingface.co/transformers/model_doc/convbert.html which implement our model in PyTorch.