google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Home Page:https://ai.google.dev/gemma

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: at::cuda::blas::gemm: not implemented for struct c10::BFloat16

dhchenx opened this issue · comments

Hi,

I am running the example in Gemma in Pytorch and encountered the following issue:

Part of the example code

image

The error is: RuntimeError: at::cuda::blas::gemm: not implemented for struct c10::BFloat16

image

My environment:
Windows 11
pytorch-1.8.1, cuda 10.2, python 3.9

Any idea how to solve the issue?

What's your GPU type? It might be the case that you are running it on a GPU that does not support bfloat16 type.

As a work-around, you can try to change here to

dtype: str = 'float16' # or 'float32'

and test to see if that works.

What's your GPU type? It might be the case that you are running it on a GPU that does not support bfloat16 type.

As a work-around, you can try to change here to

dtype: str = 'float16' # or 'float32'

and test to see if that works.

Thanks, I used NVIDIA Quadro P2200 (5 GB) .

I first tried to change dtype:str='float16', the error is below:

image

Then, I tried to change dtype:str='float32', the another error is shown below:

image

I wonder if the example code can run on low computing capacity computers. Any solution? Thanks!

Which model variant are you trying? 2B or 7B or 7B-quant? In any case, 5GB might be insufficient.

Which model variant are you trying? 2B or 7B or 7B-quant?

I used 2B-it

Yeah, 2B f16 may still be too large to fit 5GB card. Maybe try CPU first.

Yeah, 2B f16 may still be too large to fit 5GB card. Maybe try CPU first.

Thanks. I try CPU (32GB) and it works well.

OK, yeah, to fit 5GB card, we probably would need a int8 quantized 2B variant, which is not included in the release.

You can try the free Colab T4 runtime using this Colab, and it should work.
https://ai.google.dev/gemma/docs/pytorch_gemma

OK, yeah, to fit 5GB card, we probably would need a int8 quantized 2B variant, which is not included in the release.

You can try the free Colab T4 runtime using this Colab, and it should work. https://ai.google.dev/gemma/docs/pytorch_gemma

Thanks!

Closing the issue for now.