请问训练和测试都出现RuntimeError: CUDA error: invalid device function错误
Heroism502 opened this issue · comments
你好,请问训练和测试都出现如下错误,是否环境配置原因?
EMD-CD: 0%| | 0/19 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_ae.py", line 203, in
cd_loss = validate_loss(it)
File "train_ae.py", line 169, in validate_loss
metrics = EMD_CD(all_recons, all_refs, batch_size=args.val_batch_size, accelerated_cd=True)
File "/userhome/point_cloud/diffusion-point-cloud-main/evaluation/evaluation_metrics.py", line 58, in EMD_CD
cd_lst.append(dl.mean(dim=1) + dr.mean(dim=1))
RuntimeError: CUDA error: invalid device function
Segmentation fault (core dumped)
What's your GPU model?
What's your GPU model?
RTX2080
Can confirm getting the same error running on RTX2080
I will release a version that doesn't require compiling CUDA extensions soon.
请问有知道这个问题怎么解决吗?
Hi, I have a quick question, did you able to build StructuralLossesBacken ? If yes, then would you please let me know how ? I am getting a g++ error ( created an issues already but no response yet).
Hi all,
EMD_CD functions are for validation purpose only. The training doesn't rely on them, so you may remove this part of codes in the training script. I will also release a version without them later.
请问有知道这个问题怎么解决吗?
出现这个问题是evaluation目录下面那个要编译的库没有编译好,在Makefile中有个参数:CUDA_ARCH,这个参数是设置GPU的算力,不同的GPU有不同的算力,可以在NVIDIA官网:https://developer.nvidia.com/zh-cn/cuda-gpus#compute查询,然后就是设置自己gpu对应的算力。当然,可以考虑兼容性的问题,不容易崩,具体设置参考:
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
Hi all,
I have uploaded a version that doesn't require CUDA extensions. It depends ONLY on native pytorch operations.
The version also fixes the multi-processing bug in Dataloader.
You may try this new version.
Sorry for the late update.
Thanks!
Hi,
My compilation on CUDA 10.1 succeed and the invalid device function error disappears. I guess this error is caused by a mismatch between 10.0 and 10.1. So you can try to compile the metrics on CUDA 10.1.