GPT-J 6B model
timofeev1995 opened this issue Β· comments
Hello! Thank you for your framework!
I have a question about very large (6B+ models) to convert and serve using your framework.
I tried to convert with tips about large models (--fast option etc) but i have CUDA OOM even using a100 40GB NVIDIA card.
Is it expected behaviour? Is there any tips to perform conversion of models sized like that?
Thank you in advance.
sorry for the latency, do you use ONNX Runtime or TensorRT?
When converting the model on ONNX, this happens:
β /home/james/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1 β
β 182 in _slow_forward β
β β
β 1179 β β β else: β
β 1180 β β β β recording_scopes = False β
β 1181 β β try: β
β β± 1182 β β β result = self.forward(*input, **kwargs) β
β 1183 β β finally: β
β 1184 β β β if recording_scopes: β
β 1185 β β β β tracing_state.pop_scope() β
β β
β /home/james/.local/lib/python3.10/site-packages/transformers/models/gptj/mod β
β eling_gptj.py:589 in forward β
β β
β 586 β β β past_length = 0 β
β 587 β β β past_key_values = tuple([None] * len(self.h)) β
β 588 β β else: β
β β± 589 β β β past_length = past_key_values[0][0].size(-2) β
β 590 β β β
β 591 β β if position_ids is None: β
β 592 β β β position_ids = torch.arange(past_length, input_shape[-1] β
β°βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ―
IndexError: Dimension specified as -2 but tensor has no dimensions
When converting the model on TensorRT, I get the same error.
I tried --seq-len 1 128 128, 1 128 2047, 1 2048 2048, and 1 2047 2047 on both onnx and TensorRT, always the same error. I tested on an A100 and on a CPU machine with 128GB RAM.