Error help please

Question

Error help please

ratom opened this issue 8 months ago · comments

When i tried to run the code, I go tthis error. But the same code works fine when I used google colab to run. I got this error from local machine.

2023-10-14 16:29:56.639183: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-10-14 16:29:56.700051: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-14 16:29:56.700092: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-14 16:29:56.700128: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-14 16:29:56.709129: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 16:29:57.734715: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: (30 second timeout)
wandb: W&B disabled due to login timeout.
train: weights=, cfg=configs/model_resnet.yaml, data=../oxfordpets/data.yaml, hyp=configs/hyp.scratch-low.yaml, epochs=200, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
fatal: ambiguous argument 'v2..origin/master': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git [...] -- [...]'
Command 'git rev-list v2..origin/master --count' returned non-zero exit status 128.
/home/user/anaconda3/envs/trakka/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11060). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/home/user/Desktop/test/flexible-yolov5/scripts/train.py", line 666, in
main(opt)
File "/home/user/Desktop/test/flexible-yolov5/scripts/train.py", line 548, in main
device = select_device(opt.device, batch_size=opt.batch_size)
File "/home/user/Desktop/test/flexible-yolov5/./utils/torch_utils.py", line 62, in select_device
assert torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(',', '')),
AssertionError: Invalid CUDA '--device 0' requested, use '--device cpu' or pass valid CUDA device(s)

Bobo~ · Answer 1 · Sun Oct 15 2023 23:38:55 GMT+0800 (China Standard Time)

by this error, you don't have a gpu ? should use cpu version

Robin Atom Dulal · Answer 2 · Mon Oct 16 2023 07:31:13 GMT+0800 (China Standard Time)

I have a GPU. But still got this error.
The other thing, I have a GPU system but fragmented into 4 devices each of 10 GB named cuda:0,cuda:1,cuda:2 ,and cuda:3. But i can use only one cuda:0 of 10 GB. How can I use all devices of 40 GB at once to run the code, faster and use large datasets.
Thank you

Bobo~ · Answer 3 · Mon Oct 16 2023 14:49:17 GMT+0800 (China Standard Time)

use --device 0.1,2,3