Reproducing PTv3 results on ScanNet

Question

Reproducing PTv3 results on ScanNet

SlapDrone opened this issue 2 months ago · comments

Hey folks,

I'm getting acquainted with your codebase and to get my feet wet I thought I'd try to reproduce the results of PTv3 on the ScanNet test dataset. I'm using the v1.5.1 release as suggested in another issue.

I've followed the suggestions in the README and got the test script running inference on a 4090 locally.

After pulling the weights from huggingface to the local repo, along with the scannet dataset (your preprocessed version), I set up as follows using the best weights and the PTV3+PPT config that ships with the codebase:

# in my .env file...
PTV3_CONFIG_PATH=./configs/scannet/semseg-pt-v3m1-1-ppt-extreme.py
PTV3_WEIGHTS_PATH=./models/PointTransformerV3/scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth
PTV3_SAVE_PATH=./exp/scannet/semseg-pt-v3m1-1-ppt-extreme

I didn't make any changes to the config, and simply ran:

python tools/test.py --config-file ${PTV3_CONFIG_PATH} --options save_path=${PTV3_SA
VE_PATH} weight=${PTV3_WEIGHTS_PATH}

This runs without complaint, but the metrics I get out at the end are unexpected; all but the first two classes (the most frequent two?) segmented with ~0% acc/IoU. See the tail of my test.log below:

[2024-05-10 16:33:40,434 INFO test.py line 289 3025] Syncing ...
[2024-05-10 16:33:40,435 INFO test.py line 317 3025] Val result: mIoU/mAcc/allAcc 0.0408/0.0818/0.4419
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_0 - wall Result: iou/accuracy 0.4337/0.6361
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_1 - floor Result: iou/accuracy 0.3729/0.9889
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_2 - cabinet Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_3 - bed Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_4 - chair Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_5 - sofa Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_6 - table Result: iou/accuracy 0.0097/0.0099
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_7 - door Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_8 - window Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_9 - bookshelf Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_10 - picture Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,435 INFO test.py line 323 3025] Class_11 - counter Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_12 - desk Result: iou/accuracy 0.0003/0.0003
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_13 - curtain Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_14 - refridgerator Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_15 - shower curtain Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_16 - toilet Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_17 - sink Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_18 - bathtub Result: iou/accuracy 0.0000/0.0000
[2024-05-10 16:33:40,436 INFO test.py line 323 3025] Class_19 - otherfurniture Result: iou/accuracy 0.0001/0.0001
[2024-05-10 16:33:40,436 INFO test.py line 331 3025] <<<<<<<<<<<<<<<<< End Evaluation <<<<<<<<<<<<<<<<<

This obviously doesn't line up with the 78.6/79.4 mIoU reported in the paper. Any ideas what the issue may be? Might I be neglecting to configure something correctly?

Thanks in advance for any advice you may be able to give! :D

Xiaoyang Wu · Answer 1 · Mon May 13 2024 01:12:49 GMT+0800 (China Standard Time)

Hi, released weight is trained with code tag with v1.5.1. Current code modified the model structure.

SlapDrone · Answer 2 · Tue May 21 2024 03:07:33 GMT+0800 (China Standard Time)

Hey @Gofinge, I rolled back to 1.5.1 and I get the same issue.

I'm using my fork here, where I've added some commits on top to streamline the install into one command (Python 3.11 and CUDA 12.1+ with a Makefile to build the CUDA wheels, and poetry instead of conda). Aside, if you find any of that quality of life stuff desirable for main, would be happy to clean it up and open a PR to contribute.

On your huggingface repo, I see the test.log has the following for scannet:

[2024-02-05 17:02:00,913 INFO test.py line 41 42795] => Loading config ...
[2024-02-05 17:02:00,913 INFO test.py line 48 42795] => Building model ...
[2024-02-05 17:02:07,028 INFO test.py line 61 42795] Num params: 97447088
[2024-02-05 17:02:08,774 INFO test.py line 68 42795] Loading weight at: exp/scannet/semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth
[2024-02-05 17:02:12,412 INFO test.py line 80 42795] => Loaded weight 'exp/scannet/semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth' (epoch 94)
[2024-02-05 17:02:12,417 INFO test.py line 53 42795] => Building test dataset & dataloader ...
[2024-02-05 17:02:12,423 INFO scannet.py line 72 42795] Totally 312 x 1 samples in val set.
[2024-02-05 17:02:12,424 INFO test.py line 119 42795] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2024-02-05 17:02:53,119 INFO test.py line 196 42795] Test: 1/78-scene0598_02, Batch: 0/127
...
[2024-02-05 17:03:11,970 INFO test.py line 230 42795] Test: scene0598_02 [1/78]-176139 Batch 22.255 (22.255) Accuracy 0.9617 (0.1453) mIoU 0.7039 (0.1408)

Whereas I get the output (I committed to my fork here:

[2024-05-20 11:54:49,320 INFO test.py line 41 33124] => Loading config ...
[2024-05-20 11:54:49,320 INFO test.py line 48 33124] => Building model ...
[2024-05-20 11:54:53,240 INFO test.py line 61 33124] Num params: 97447088
[2024-05-20 11:54:53,460 INFO test.py line 68 33124] Loading weight at: ./models/PointTransformerV3/scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth
[2024-05-20 11:54:54,784 INFO test.py line 80 33124] => Loaded weight './models/PointTransformerV3/scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth' (epoch 94)
[2024-05-20 11:54:54,788 INFO test.py line 53 33124] => Building test dataset & dataloader ...
[2024-05-20 11:54:54,790 INFO scannet.py line 72 33124] Totally 312 x 1 samples in val set.
[2024-05-20 11:54:54,791 INFO test.py line 119 33124] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2024-05-20 11:55:01,331 INFO test.py line 168 33124] 1/312: scene0131_00, loaded pred and label.
[2024-05-20 11:55:01,350 INFO test.py line 230 33124] Test: scene0131_00 [1/312]-177091 Batch 0.020 (0.020) Accuracy 0.5289 (0.0844) mIoU 0.1445 (0.0506)

I notice that the number of test samples are different (78 in the huggingface log, 312 when I run from the local config). This suggests that maybe the testing/evaluation datasets are different in each case?

Thanks in advance for any advice you can give.

SlapDrone · Answer 3 · Tue May 21 2024 04:52:56 GMT+0800 (China Standard Time)

Fixed, by swapping out the standard PTv3+PPT scannet config with the specific one in the huggingface repo of the weights.