The testing results of the whole dataset is empty

Question

The testing results of the whole dataset is empty

blowhen opened this issue 2 years ago · comments

Integrating Na into mmdetection can run, but it keeps reporting errors,The testing results of the whole dataset is empty According to the solution of mmdetection, the learning rate is modified, and there is still no verification set result

blowhen · Answer 1 · Fri May 13 2022 10:31:50 GMT+0800 (China Standard Time)

And the data set is normal

Ali Hassani · Answer 2 · Fri May 13 2022 10:56:47 GMT+0800 (China Standard Time)

Hello and thank you for your interest.
This error typically occurs when there's a gradient explosion, so the model starts producing results that can't be correctly validated. It's usually not a dataset issue, but it could be an environment issue. Can you share your environment details (i.e. Python version, torch, torchvision, mmcv, mmdetection, ninja versions specifically)?

blowhen · Answer 3 · Fri May 13 2022 11:12:07 GMT+0800 (China Standard Time)

python3.7
torch1.7.1
torchvision0.8.2
mmcv-full 1.5.0
mmdetection v2.24.1
ninja 1.10.2.3

I adjusted that warm up doesn't work, and the computer supports cuda11.2

blowhen · Answer 4 · Fri May 13 2022 11:15:32 GMT+0800 (China Standard Time)

I mean, I'm using cuda11 0, this computer supports up to 11.2

Ali Hassani · Answer 5 · Fri May 13 2022 11:35:24 GMT+0800 (China Standard Time)

Have you tried using the recommended environment?
Using the same versions specifically matters in reproducibility. Especially since we haven't verified that the kernel operates as expected in torch versions below 1.8.

blowhen · Answer 6 · Fri May 13 2022 12:03:23 GMT+0800 (China Standard Time)

Running the base version on the a6000, the result of the verification set can be obtained, but occasionally there is no result, but running the mini and tiny versions has no result of the verification set

blowhen · Answer 7 · Fri May 13 2022 12:05:50 GMT+0800 (China Standard Time)

Moreover, the operation of mini and tiny versions is a little strange, that is, there are verification set results in the first three rounds, and there are no results in the following rounds

Ali Hassani · Answer 8 · Fri May 13 2022 12:17:48 GMT+0800 (China Standard Time)

Again, the no results warning in mmcv just generally points to training collapsing. Based on the versions you shared I wouldn't be too surprised if those were the root of the issue, because different torch/mmcv versions tend to work differently, and we trained all of our models with torch 1.11.

blowhen · Answer 9 · Fri May 13 2022 12:22:12 GMT+0800 (China Standard Time)

OK, I'll try another computer and I'll feed back the relevant results in real time. Thank you for your answer!

Ali Hassani · Answer 10 · Fri May 13 2022 12:59:42 GMT+0800 (China Standard Time)

You don't have to try another machine, you can simply set up a virtual environment and use the requriements.txt file provided to install the recommended versions of torch, mmcv and the like.

blowhen · Answer 11 · Fri May 13 2022 15:04:01 GMT+0800 (China Standard Time)

I know what you mean. I've tried the recommended environment again,
cuda11.3
torch1.11
mmcv-full 1.4.8
mmdetection v2.24.1
ninja 1.10.2.3
But the first three rounds have results, but the later ones still have no results.
This problem has been bothered for two or three days. How can I solve it?

Ali Hassani · Answer 12 · Fri May 13 2022 15:13:14 GMT+0800 (China Standard Time)

Please note that this is still not the recommended environment, you're still on the wrong mmdet version. This is the correct setup:

torch==1.11.0+cu113
torchvision==0.12.0+cu113
mmcv-full==1.4.8
mmdet==2.19.0
ninja==1.10.2.3

It would also be more helpful if you could provide a log and the command you're trying to run if it occurs with these settings.

Ali Hassani · Answer 13 · Fri May 13 2022 16:06:25 GMT+0800 (China Standard Time)

I'm not sure why you're trying to build mmdet from scratch. All you need to do after setting up and activating the python environment is to do pip3 -r requirements.txt, and then run with the scripts provided. You don't have to build mmdet from scratch, and that is not recommended.

Ali Hassani · Answer 14 · Thu May 26 2022 00:29:11 GMT+0800 (China Standard Time)

Closing this due to inactivity. If you still have questions feel free to open it back up.