请问如何在finetune中使用多GPU训练?

Question

请问如何在finetune中使用多GPU训练?

02hao09 opened this issue 2 years ago · comments

您好, 我照着train.py中的代码使用在finetune.py, 在训练时发生以下错误, 请问我要怎么修改, 才能正确训练?
谢谢

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: TypeError: generator yielded an element that could not be converted to the expected type. The expected type was float32, but the yielded element was [array([[ 101, 2349, 25480, ..., 9172, 16054, 102],
[ 101, 2335, 5088, ..., 4934, 31621, 102],
[ 101, 2349, 25480, ..., 18312, 5661, 102],
[ 101, 2349, 25480, ..., 33732, 11511, 102]]), array([[ 101, 22191, 27209, 41412, 31201, 8506, 42696, 31201, 5661,

02hao09 · Answer 1 · Mon Feb 21 2022 16:54:21 GMT+0800 (China Standard Time)

忘了将finetune.py的 generator 换成 train.py 中的, 导致转tf.dataset 形状不对

class data_generator(DataGenerator):
"""数据生成器
"""
def iter(self, random=False):
for is_end, texts in self.sample(random):
source, target = pseudo_summary(texts)
source_ids, _ = tokenizer.encode(source, maxlen=s_maxlen)
target_ids, _ = tokenizer.encode(target, maxlen=t_maxlen)
yield source_ids, target_ids

hashen · Answer 2 · Mon Apr 18 2022 17:23:45 GMT+0800 (China Standard Time)

@02hao09 你好，我在单机多卡上finetune用MirroredStrategy支持多卡，但是从GPU利用率上看只使用了单卡，你最终是怎么使用多卡的呢？

chezhonghao · Answer 3 · Tue Feb 21 2023 15:18:14 GMT+0800 (China Standard Time)

@02hao09 你好，我在单机多卡上finetune用MirroredStrategy支持多卡，但是从GPU利用率上看只使用了单卡，你最终是怎么使用多卡的呢？

你好。请问你解决这个问题了吗

02hao09 · Answer 4 · Tue Feb 21 2023 16:28:52 GMT+0800 (China Standard Time)

@KobeChe 您好,
先使用以下代碼查看是否有多GPU可使用
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():

  t5 = build_transformer_model(
      config_path=config_path,
      checkpoint_path=None,
      model='mt5.1.1',
      return_keras_model=False,
      name='T5',
  )


  model = t5.model
  

  output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])

  model = Model(model.inputs, output)
  model.compile(optimizer=Adam(1e-5))
  model.summary()
  t5.load_weights_from_checkpoint(checkpoint_path)

encoder = t5.encoder
decoder = t5.decoder

chezhonghao · Answer 5 · Tue Feb 21 2023 19:24:33 GMT+0800 (China Standard Time)

@KobeChe 您好, 先使用以下代碼查看是否有多GPU可使用 import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
  t5 = build_transformer_model(
      config_path=config_path,
      checkpoint_path=None,
      model='mt5.1.1',
      return_keras_model=False,
      name='T5',
  )


  model = t5.model
  

  output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])

  model = Model(model.inputs, output)
  model.compile(optimizer=Adam(1e-5))
  model.summary()
  t5.load_weights_from_checkpoint(checkpoint_path)
encoder = t5.encoder decoder = t5.decoder

感谢🙏 😊。