ZhuiyiTechnology / t5-pegasus

中文生成式预训练模型

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请问如何在finetune中使用多GPU训练?

02hao09 opened this issue · comments

您好, 我照着train.py中的代码使用在finetune.py, 在训练时发生以下错误, 请问我要怎么修改, 才能正确训练?
谢谢

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: TypeError: generator yielded an element that could not be converted to the expected type. The expected type was float32, but the yielded element was [array([[ 101, 2349, 25480, ..., 9172, 16054, 102],
[ 101, 2335, 5088, ..., 4934, 31621, 102],
[ 101, 2349, 25480, ..., 18312, 5661, 102],
[ 101, 2349, 25480, ..., 33732, 11511, 102]]), array([[ 101, 22191, 27209, 41412, 31201, 8506, 42696, 31201, 5661,

忘了将finetune.py的 generator 换成 train.py 中的, 导致转tf.dataset 形状不对

class data_generator(DataGenerator):
"""数据生成器
"""
def iter(self, random=False):
for is_end, texts in self.sample(random):
source, target = pseudo_summary(texts)
source_ids, _ = tokenizer.encode(source, maxlen=s_maxlen)
target_ids, _ = tokenizer.encode(target, maxlen=t_maxlen)
yield source_ids, target_ids

@02hao09 你好,我在单机多卡上finetune用MirroredStrategy支持多卡,但是从GPU利用率上看只使用了单卡,你最终是怎么使用多卡的呢?

@02hao09 你好,我在单机多卡上finetune用MirroredStrategy支持多卡,但是从GPU利用率上看只使用了单卡,你最终是怎么使用多卡的呢?

你好。请问你解决这个问题了吗

@KobeChe 您好,
先使用以下代碼查看是否有多GPU可使用
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():

  t5 = build_transformer_model(
      config_path=config_path,
      checkpoint_path=None,
      model='mt5.1.1',
      return_keras_model=False,
      name='T5',
  )


  model = t5.model
  

  output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])

  model = Model(model.inputs, output)
  model.compile(optimizer=Adam(1e-5))
  model.summary()
  t5.load_weights_from_checkpoint(checkpoint_path)

encoder = t5.encoder
decoder = t5.decoder

@KobeChe 您好, 先使用以下代碼查看是否有多GPU可使用 import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():

  t5 = build_transformer_model(
      config_path=config_path,
      checkpoint_path=None,
      model='mt5.1.1',
      return_keras_model=False,
      name='T5',
  )


  model = t5.model
  

  output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])

  model = Model(model.inputs, output)
  model.compile(optimizer=Adam(1e-5))
  model.summary()
  t5.load_weights_from_checkpoint(checkpoint_path)

encoder = t5.encoder decoder = t5.decoder

@KobeChe 您好, 先使用以下代碼查看是否有多GPU可使用 import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():

  t5 = build_transformer_model(
      config_path=config_path,
      checkpoint_path=None,
      model='mt5.1.1',
      return_keras_model=False,
      name='T5',
  )


  model = t5.model
  

  output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])

  model = Model(model.inputs, output)
  model.compile(optimizer=Adam(1e-5))
  model.summary()
  t5.load_weights_from_checkpoint(checkpoint_path)

encoder = t5.encoder decoder = t5.decoder

感谢🙏 😊。