GPT-J 6B model

Question

GPT-J 6B model

timofeev1995 opened this issue 2 years ago · comments

Hello! Thank you for your framework!
I have a question about very large (6B+ models) to convert and serve using your framework.
I tried to convert with tips about large models (--fast option etc) but i have CUDA OOM even using a100 40GB NVIDIA card.
Is it expected behaviour? Is there any tips to perform conversion of models sized like that?
Thank you in advance.

Michaël Benesty · Answer 1 · Fri Nov 11 2022 00:00:00 GMT+0800 (China Standard Time)

sorry for the latency, do you use ONNX Runtime or TensorRT?

CrazyPython · Answer 2 · Wed Feb 01 2023 04:54:37 GMT+0800 (China Standard Time)

When converting the model on ONNX, this happens:

│ /home/james/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1 │
│ 182 in _slow_forward                                                         │
│                                                                              │
│   1179 │   │   │   else:                                                     │
│   1180 │   │   │   │   recording_scopes = False                              │
│   1181 │   │   try:                                                          │
│ ❱ 1182 │   │   │   result = self.forward(*input, **kwargs)                   │
│   1183 │   │   finally:                                                      │
│   1184 │   │   │   if recording_scopes:                                      │
│   1185 │   │   │   │   tracing_state.pop_scope()                             │
│                                                                              │
│ /home/james/.local/lib/python3.10/site-packages/transformers/models/gptj/mod │
│ eling_gptj.py:589 in forward                                                 │
│                                                                              │
│    586 │   │   │   past_length = 0                                           │
│    587 │   │   │   past_key_values = tuple([None] * len(self.h))             │
│    588 │   │   else:                                                         │
│ ❱  589 │   │   │   past_length = past_key_values[0][0].size(-2)              │
│    590 │   │                                                                 │
│    591 │   │   if position_ids is None:                                      │
│    592 │   │   │   position_ids = torch.arange(past_length, input_shape[-1]  │
╰──────────────────────────────────────────────────────────────────────────────╯
IndexError: Dimension specified as -2 but tensor has no dimensions

When converting the model on TensorRT, I get the same error.

I tried --seq-len 1 128 128, 1 128 2047, 1 2048 2048, and 1 2047 2047 on both onnx and TensorRT, always the same error. I tested on an A100 and on a CPU machine with 128GB RAM.