运行trainer时报错Error building extension 'fused_adam'
J-G-Y opened this issue · comments
如题
前置报错Unsupported gpu architecture 'compute_80'
with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable:
outputs = model(**batch, use_cache=False)
loss = outputs.loss
tr_loss += loss.item()
model.backward(loss)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
model.step()
with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16) as autocast, torch.backends.cuda.sdp_kernel(enable_flash=False) as disable: outputs = model(**batch, use_cache=False) loss = outputs.loss tr_loss += loss.item() model.backward(loss) torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) model.step()
请问这段代码加在哪里?我也和楼主一样的错误,是在:
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
dist_init_required=True)
这里报的错误~
听该是cuda版本的问题,cuda版本和装的要保持一致