请问如何在finetune中使用多GPU训练?
02hao09 opened this issue · comments
您好, 我照着train.py中的代码使用在finetune.py, 在训练时发生以下错误, 请问我要怎么修改, 才能正确训练?
谢谢
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: TypeError: generator
yielded an element that could not be converted to the expected type. The expected type was float32, but the yielded element was [array([[ 101, 2349, 25480, ..., 9172, 16054, 102],
[ 101, 2335, 5088, ..., 4934, 31621, 102],
[ 101, 2349, 25480, ..., 18312, 5661, 102],
[ 101, 2349, 25480, ..., 33732, 11511, 102]]), array([[ 101, 22191, 27209, 41412, 31201, 8506, 42696, 31201, 5661,
忘了将finetune.py的 generator 换成 train.py 中的, 导致转tf.dataset 形状不对
class data_generator(DataGenerator):
"""数据生成器
"""
def iter(self, random=False):
for is_end, texts in self.sample(random):
source, target = pseudo_summary(texts)
source_ids, _ = tokenizer.encode(source, maxlen=s_maxlen)
target_ids, _ = tokenizer.encode(target, maxlen=t_maxlen)
yield source_ids, target_ids
@02hao09 你好,我在单机多卡上finetune用MirroredStrategy支持多卡,但是从GPU利用率上看只使用了单卡,你最终是怎么使用多卡的呢?
你好。请问你解决这个问题了吗
@KobeChe 您好,
先使用以下代碼查看是否有多GPU可使用
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
t5 = build_transformer_model(
config_path=config_path,
checkpoint_path=None,
model='mt5.1.1',
return_keras_model=False,
name='T5',
)
model = t5.model
output = CrossEntropy(1)([model.inputs[1], model.outputs[0]])
model = Model(model.inputs, output)
model.compile(optimizer=Adam(1e-5))
model.summary()
t5.load_weights_from_checkpoint(checkpoint_path)
encoder = t5.encoder
decoder = t5.decoder
@KobeChe 您好, 先使用以下代碼查看是否有多GPU可使用 import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
t5 = build_transformer_model( config_path=config_path, checkpoint_path=None, model='mt5.1.1', return_keras_model=False, name='T5', ) model = t5.model output = CrossEntropy(1)([model.inputs[1], model.outputs[0]]) model = Model(model.inputs, output) model.compile(optimizer=Adam(1e-5)) model.summary() t5.load_weights_from_checkpoint(checkpoint_path)
encoder = t5.encoder decoder = t5.decoder
@KobeChe 您好, 先使用以下代碼查看是否有多GPU可使用 import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
之後再model外包一層 strategy.scope() , 如下代碼 , 我是這樣就能成功執行了
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
t5 = build_transformer_model( config_path=config_path, checkpoint_path=None, model='mt5.1.1', return_keras_model=False, name='T5', ) model = t5.model output = CrossEntropy(1)([model.inputs[1], model.outputs[0]]) model = Model(model.inputs, output) model.compile(optimizer=Adam(1e-5)) model.summary() t5.load_weights_from_checkpoint(checkpoint_path)
encoder = t5.encoder decoder = t5.decoder
感谢🙏 😊。