Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(documentation) How do I know if generate.py is running on GPU / GPU configuration

maathieu opened this issue · comments

Hi, I have a NVidia Quadro P5200 with 32GB of VRAM, yet when I run the codes for a test they perform extremely slowly and in the task manager the GPU's used ram stays near 0. I think the code is not using my GPU. Is there special configuration to do beyond pip install -r requirements.txt to get this running on GPU?

In general, if you start a new Python session, does

import torch
print(torch.cuda.is_available())

show True?