Xirider / finetune-gpt2xl

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

remotejob opened this issue · comments

I try to use your script (gpt2-xl) but I have an error:
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

pip list
Package Version


certifi 2021.5.30
charset-normalizer 2.0.4
click 8.0.1
configparser 5.0.2
datasets 1.8.0
deepspeed 0.4.0
dill 0.3.4
docker-pycreds 0.4.0
filelock 3.0.12
fsspec 2021.7.0
gitdb 4.0.7
GitPython 3.1.18
huggingface-hub 0.0.8
idna 3.2
importlib-metadata 4.7.0
joblib 1.0.1
multiprocess 0.70.12.2
ninja 1.10.2
numpy 1.21.2
packaging 21.0
pandas 1.3.2
pathtools 0.1.2
Pillow 8.3.1
pip 21.2.4
promise 2.3
protobuf 3.17.3
psutil 5.8.0
pyarrow 3.0.0
pyparsing 2.4.7
python-dateutil 2.8.2
pytz 2021.1
PyYAML 5.4.1
regex 2021.8.21
requests 2.26.0
sacremoses 0.0.45
sentry-sdk 1.3.1
setuptools 57.4.0
shortuuid 1.0.1
six 1.16.0
smmap 4.0.0
subprocess32 3.5.4
tensorboardX 1.8
tokenizers 0.10.3
torch 1.9.0
torchvision 0.10.0
tqdm 4.49.0
transformers 4.7.0
triton 1.0.0
typing-extensions 3.10.0.0
urllib3 1.26.6
wandb 0.12.0
wheel 0.37.0
xxhash 2.0.2
zipp 3.5.0

without :


"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}


in ds_config.json all work it takes 17 min

Same problem

I also occur that. before AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
The error also show: cannot make a dir in /tmp/torch_extensions/build for cpu_adam.
So I change the DEFAULT_TORCH_EXTENSION_PATH in the file /anaconda3/envs/XXXXX/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py
from "/tmp/torch_extensions/" to any path where I have permission to create folders.
then it works.

For me I noticed it was exciting on a ['which', 'c++'] eval before AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

In my case, installing / updating g++ successfully resolved the issue for me

In my case installing "cudatoolkit-dev" solved the issue

torch offers different versions for cpu and cuda devices.
I removed cpu version and install cuda version as per guidelines here:
https://pytorch.org/get-started/locally/

This is what I installed for pip:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Additionally there was this intermediate error:
"cannot find libcurand.so.o something"
which was solved by installing:
apt-get install -y libopenblas-base

But make sure, you on Ubuntu 20.04 or higher before installing libopenblas-base.

And that's how my problem was solved!

I think it is the problem with that specific deepspeed version (i.e., 0.4.0) in requirements. In my case, it was solved by upgrading deepspeed. You can upgrade it by using this command pip install -U deepspeed and it should be fixed.

For me I noticed it was exciting on a ['which', 'c++'] eval before AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

In my case, installing / updating g++ successfully resolved the issue for me

Thanks.
This is effective for me.