tsunghan-wu / RandLA-Net-pytorch

:four_leaf_clover: Pytorch Implementation of RandLA-Net (https://arxiv.org/abs/1911.11236)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the test code

yuan-zm opened this issue · comments

Hello, thanks for your amazing work.

After the model had trained. I using test_SemanticKITTI.py for inference. However, I found the self.test_dataset.min_possibility is not updating during the test time. Could you please give me some suggestions?

Hi,

According to the source code here, we picked the least possibility points while testing. And as you can see, the minimal probability is increasing (+= delta) each time. In fact, it takes a lot of time to reach the terminal condition (i.e. min_possibility > threshold), but I think the implementation is correct.

If you found further questions, feel free to discuss with me. (Maybe I made a mistakes but I even didn't find it XD)

Thank you for your helpful reply!

Yes, the minimal probability is increasing (+= delta) each time. As you mentioned in readme, I using pytorch 1.4 at inference time. It's strange that id(self.test_data.min_possibility) function in test_semanticKITTI.py(line 107) and id(self.test_data.min_possibility) function in semkitti_testset.py(line 58) shows the same id, but the self.test_data.min_possibility is not increasing in test_SemanticKITTI.py(line 107) during inference time. So I using collate_fn(line 138 in semkitti_testset.py) to return test_data.min_possibility, it works.

After the above question has solved, I meet another problem. When I using nvidia-smi during inference time, the Volatile GPU-Util is too low(about 23%). Increasing the number of num_works or batch_size brings no gain for the Volatile GPU-Util. Are you facing the same problem?

Sorry, I am not good at English. Thanks for your helpful answer.

Hi, Thanks for your kindly reply.

First, the min_possibility issue you pointed out above might be a bug. Maybe I made some mistake when implementation. Sorry to make you confused. However, I am busy recently; thus I might verify and fix the bug a few days later. Really glad to hear you found the solution to fix it! (You can raise a PR if you want)

Second, for the low GPU-Utilization issue, I've suffered from it too. The fact is that in the official implementation, when inference, Rand-LA-Net always sequentially select the minimal possibility points and predict patch-by-patch and then finally merge/ensemble all the prediction. Thus, the commonly used Dataset class cannot make it because in this scenario, multiple workers see the same min_possibility table and choose the same min_possibility point (not last-1, last-2, ... last-N points) and thus it cause duplicated samples in a batch. Thus, I use PyTorch IterableDataset rather than Dataset class to implement sequential selection. In this implementation, I set the batch_size when declaring dataset class instead of calling Dataloader. As for num_workers, I face the same problems. Whenever I want to increase workers to > 0 value, the speed becomes dramatically low. Currently, I didn't find any better solutions to overcome this strange phenomenon.

Lastly, I am not native-English (I'm born in Taiwan) and thus my English is not proficient.
If you find further problems, feel free to ask me questions or report anything.
Thank you.

Thank you for your helpful suggestions despite your busy schedules.

I've never made PR in github before, I'm going to give it a try.

Em, when I use your code to train or inference, I also meet other problems. For example, I must using pytorch 1.1 to train and using pytorch 1.4 for inference. When I using pytorch1.1 to train RandLA Net, I meet an error at line 119 in train_SemanticKITTI.py. The error is that loss is not contiguous. When I using pytorch 1.1 for inferencing, I can not use IterableDataset. Sadly, I did' t find a good solution for this.

And you are missing two functions in data_process.py.
`@staticmethod
def load_pc_kitti(pc_path):
scan = np.fromfile(pc_path, dtype=np.float32)
scan = scan.reshape((-1, 4))
points = scan[:, 0:3] # get xyz
return points

@staticmethod
def load_label_kitti(label_path, remap_lut):
    label = np.fromfile(label_path, dtype=np.uint32)
    label = label.reshape((-1))
    sem_label = label & 0xFFFF  # semantic label in lower half
    inst_label = label >> 16  # instance id in upper half
    assert ((sem_label + (inst_label << 16) == label).all())
    sem_label = remap_lut[sem_label]
    return sem_label.astype(np.int32)`

Your work helps me a lot. Many thanks for sharing!

Close for finishing the conversation.

@tsunghan-mama @dream-toy hello
i have a question,how long time when run test_SemanticKITTI.py

@tsunghan-mama @dream-toy hello
i have a question,how long time when run test_SemanticKITTI.py

About 40 minutes. I think the bottleneck is self.update_predict function. Because the network's output should save to self.test_probs. So there are many I/0 between memory and GPU during prediction. Until now, I don't have a good solution for this.

Is it similar to the original Tensorflow code

@dream-toy @caoyifeng001 I agree that voting several prediction is not efficient. I've tried to remove the voting procedure but get a really bad result. I think maybe you can use some other methods which do not need this step, such as MinkowskiNet or SPVCNN.

commented

Hi, thanks a lot for the great work.
Just want to know how long it takes to finish one epoch training.
I'm using one V100 GPU, the training time is over 2 hours / epoch. Not sure if this is normal.

@13952522076. for a single 20800ti need 42min peer epoch in my experiment.

commented

@huixiancheng Thanks a lot. Just want to know if you have made some modifications?

@13952522076 Just modify num_worker and reduce val_batch_size to 15 to fit the device. and some unrelated log output modifications
From the logs, it seems to be work on the right way.
QQ截图20210707120620

Sorry, please allow me to refuse

commented

@huixiancheng No problem, thanks a lot for your kind help.

Hi, I would like to know how much memory you need for testing SemanticKITTI. When setting batch=1, I need almost 32G of memory (not GPU memory). Is this normal? Or is there any way to reduce that demand? @dream-toy