haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Home Page:https://llava.hliu.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to determine the device handle for GPU0000:4F:00.0: Unknown Error

liuhui0401 opened this issue · comments

Question

When I finetuned the llava, I met such a problem "Unable to determine the device handle for GPU0000:4F:00.0: Unknown Error". I finetuned on eight A100 gpu. This problem usually occured after three hours of finetuning. I finetuned on two different servers and this problem occurred. Can anyone please tell me what causes this?