pinecone-io / examples

Jupyter Notebooks to help you get hands-on with Pinecone vector databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notebook Crashes when loading model to CUDA

souvik0306 opened this issue · comments

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hi, I was executing your notebook, but somehow my Kernel crashes every time I try to load the model. I am using an NVIDIA GeForce RTX 4090 GPU. The point at which it crashes -


from torch import cuda, bfloat16
import transformers

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

model = transformers.AutoModelForCausalLM.from_pretrained(
    'mosaicml/mpt-7b-instruct',
    trust_remote_code=True,
    torch_dtype=bfloat16,
    max_seq_len=2048
)
model.eval()
model.to(device)
print(f"Model loaded on {device}")

I am using the latest version of the transformers library and the model checkpoint. I have also ensured that all the dependencies are up to date.

Any assistance would be greatly appreciated. Thank you!

Expected Behavior

Model should get loaded on CUDA

Steps To Reproduce

Ran it as the same way as your code.

If possible can you confirm if you were using Colab Pro?

Relevant log output

No response

Environment

- **OS**: Ubuntu
- **Language version**: Python3

My System Specs -
NVIDIA GeForce RTX 4090
PyTorch -2.0.1, 
Cuda 11.7

Additional Context

No response

the same is happening to me

Hello, it appears that your instance lacks sufficient RAM capacity to load the entire model into memory. To resolve this, you can execute our notebooks using Google Colab with a modified Runtime Environment. I recommend setting the Hardware accelerator to GPU, selecting T4 as the GPU type, and configuring the Runtime Shape to "High RAM."

If the issue persists, consider running the notebook on a larger GPU instance.