ValueError when predicting with pretrained models

Question

ValueError when predicting with pretrained models

iocuydi opened this issue 3 years ago · comments

Describe the bug
When using GPT3XL to perform inference with the --predict flag as shown in examples, the following error is thrown

ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']

This is with a single GTX 1070 GPU.

commands that both produced this error were:
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt --gpu_ids=['device:GPU:0']

Stella Biderman · Answer 1 · Mon Mar 22 2021 12:22:58 GMT+0800 (China Standard Time)

This code was designed for TPUs and although it should work on GPUs that's not something we officially support. We recommend using the GPT-NeoX repo instead for GPU training.

That said it seems like it's having trouble identifying your GPU. Try the command nvidia-smi and check what the device ID number of your GPU is.

Also, does your machine have 255 GPUs? Otherwise I have no idea where it's getting that number from...

iocuydi · Answer 2 · Mon Mar 22 2021 12:28:26 GMT+0800 (China Standard Time)

No, my machine has only 1 GPU lol. I haven't used mesh tensorflow before but I found this issue:
google-research/text-to-text-transfer-transformer#334
in which it seemed to be an issue with the mesh shape? I notice that the mesh shape is Shape[x=128, y=2] when running the above commands, so perhaps it has to do with this?

The device appears to be registered as device 0, other tensorflow models pick it up as 0, and when first loading tensorflow I see the typical "Adding visible gpu devices: 0 ... Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6709 MB memory) -> physical GPU (device: 0 ..." messages

iocuydi · Answer 3 · Mon Mar 22 2021 12:50:47 GMT+0800 (China Standard Time)

I got around this error by setting params['mesh_shape'] = [], not sure if this broke something else because now I'm getting the error:
'tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Rsqrt' used by {{node gpt2/h0/norm_1/rsqrt/parallel_0/Rsqrt}} with these attrs: [T=DT_BFLOAT16]'

although it appeared to build the model properly before displaying this

iocuydi · Answer 4 · Mon Mar 22 2021 14:00:50 GMT+0800 (China Standard Time)

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

Stella Biderman · Answer 5 · Mon Mar 22 2021 20:49:56 GMT+0800 (China Standard Time)

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'
should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

Great to know! Thanks for chasing this down for us.

I’m going to leave this open as a reminder to make sure this is in the next update.

soneo1127 · Answer 6 · Wed Mar 24 2021 07:55:58 GMT+0800 (China Standard Time)

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system?
(I want to predict on 2 GPUs).

GenTxt · Answer 7 · Wed Mar 24 2021 08:01:59 GMT+0800 (China Standard Time)

Hello. I have the same error and would appreciate knowing which files to edit to add the above solutions:

Setting mesh shape to 1x1

model_fns.py

mesh_shape = mtf.convert_to_shape(params["mesh_shape"])

mesh_shape = mtf.convert_to_shape(params["1x1"]) = error

config.json (GPT3_XL_Pile)

"mesh_shape" : "x:128,y:2", (here? "1x1")

Another .py file?

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

add to 'main.py' and/or 'model_fns.py' (???)

import os
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

As a sidebar managed to convert both models using your 'convert_gpt.py' repo script. Keep in mind to change the huggingface config.json files to match the same "n_head" values or it will generate gibberish.

Able to sample from transformer-based repos

"n_head" : 16, (GPT3_XL)

"n_head" : 20, (GPT3_2.7B)

Cheers

Stella Biderman · Answer 8 · Thu Mar 25 2021 00:25:19 GMT+0800 (China Standard Time)

@GenTxt I haven't actually run the model on GPU, so I'll leave that question for @iocuydi or @soneo1127 to answer. I am intrigued by your sidebar though.

Did you test your converted model on long inputs? We are under the impression that that file doesn't work as-is, due to the fact that our model uses local and global attention. Specifically, we think that for short contexts (less than 256 char, if I recall correctly) it works fine but for full contexts it does not. HF is working on (what we think is the problem) on their end, but it'd be a big win if that was extraneous.

Stella Biderman · Answer 9 · Thu Mar 25 2021 00:29:49 GMT+0800 (China Standard Time)

@soneo1127 I'm going to recommend you check out the Mesh TF documentation for further info.

GenTxt · Answer 10 · Thu Mar 25 2021 21:20:16 GMT+0800 (China Standard Time)

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

In the meantime would appreciate requested info from others in this thread.

Cheers,

Stella Biderman · Answer 11 · Thu Mar 25 2021 21:42:07 GMT+0800 (China Standard Time)

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

I just double checked and it’s actually ~512 where performance should jump off a cliff. For prompts of length 400-512, I would expect that the initial tokens are good but as the model goes on it devolves into gibberish. Is that what you see?

It’s good to see that the model is often better than 1.5B GPT-2: that’s what our preliminary testing has shown too. The next update to the README will include the following table:

Model	Pile BPB	Pile PPL	Lambada Acc.	Lambada PPL.	Wikitext PPL.
GPT-Neo XL (1.3B)	0.7527	6.159	64.73%	5.04	13.10
GPT-3 XL (1.3B)	------	-----	63.6%	5.44	-----
GPT-2 (1.5B)	1.0468	-----	63.24%	8.63	17.48
GPT-Neo Alan (2.7B)	0.7165	5.646	68.83%	4.137	11.39
GPT-3 Ada (2.7B)	0.9631	-----	67.1%	4.60	-----
GPT-3 DaVinci (175B)	0.7177	-----	76.2%	3.00	-----

Stella Biderman · Answer 12 · Fri Mar 26 2021 12:30:09 GMT+0800 (China Standard Time)

@GenTxt FYI, I have created an issue to serve as the canonical reference for the conversion script issue #174. Please direct any future queries about the conversion script there

jaehyunshinML · Answer 13 · Tue Apr 06 2021 13:15:19 GMT+0800 (China Standard Time)

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system?
(I want to predict on 2 GPUs).

Hi

I think the easiest way to use multi-GPU, change Mesh_shape.
Set x as 1 and set the y with the number of your GPU in the config file.
For example, if you have 4 GPUs.

"mesh_shape" : "x:1,y:4",