Excessive memory requirements

Question

Excessive memory requirements

huixiancheng opened this issue 3 years ago · comments

Hi, I would like to know how much memory you need for testing SemanticKITTI. When setting batch=1, I need almost 32G of memory (not GPU memory). Is this normal? Or is there any way to reduce that demand？

fenfenglitech · Answer 1 · Sat Dec 11 2021 11:52:44 GMT+0800 (China Standard Time)

hi@huixiancheng,if you test succsesfully?i got some problems when i was testing ,i can't do the testing process on sequences13,19,and 21,however other sequences are successed. could you give me some advices?

Huixian Cheng · Answer 2 · Sat Dec 11 2021 12:20:29 GMT+0800 (China Standard Time)

May caused by out of memory. A simple way to solve this is just use Slice in here.
https://github.com/tsunghan-mama/RandLA-Net-pytorch/blob/913837e846176e4247a7e21783bf8f2f38576257/dataset/semkitti_testset.py#L26

Such as 4071 in seq 08. Just infer two time. Rough but effective and not impact on accuracy in my tests
Once is
self.data_list = sorted(self.data_list)[0:3000].
Then ifer again in
self.data_list = sorted(self.data_list)[3000:]

fenfenglitech · Answer 3 · Sun Dec 19 2021 16:01:46 GMT+0800 (China Standard Time)

thank you for your help,and i have solved this problem. however,i test the score on the competition failed,like this: how can i do?

…

------------------ 原始邮件 ------------------ 发件人: "tsunghan-mama/RandLA-Net-pytorch" ***@***.***>; 发送时间: 2021年12月11日(星期六) 中午12:20 ***@***.***>; ***@***.******@***.***>; 主题: Re: [tsunghan-mama/RandLA-Net-pytorch] Excessive memory requirements (#10) May caused by out of memory. A simple way to solve this is just use Slice in here. https://github.com/tsunghan-mama/RandLA-Net-pytorch/blob/913837e846176e4247a7e21783bf8f2f38576257/dataset/semkitti_testset.py#L26 Such as 4071 in seq 08. Just infer two time. Rough but effective and not impact on accuracy in my tests Once is self.data_list = sorted(self.data_list)[0:3000]. Then ifer again in self.data_list = sorted(self.data_list)[3000:] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

fenfenglitech · Answer 4 · Sun Dec 19 2021 16:11:42 GMT+0800 (China Standard Time)

thank you for your help,and i have solved this problem. however,i test the score on the competition failed,like this: hoping you could give me some suggestions.thank you very much! adding:   my work is based on original RandLA-Net code,and i appends my labels files at last.

…

------------------ 原始邮件 ------------------ 发件人: "tsunghan-mama/RandLA-Net-pytorch" ***@***.***>; 发送时间: 2021年12月11日(星期六) 中午12:20 ***@***.***>; ***@***.******@***.***>; 主题: Re: [tsunghan-mama/RandLA-Net-pytorch] Excessive memory requirements (#10) May caused by out of memory. A simple way to solve this is just use Slice in here. https://github.com/tsunghan-mama/RandLA-Net-pytorch/blob/913837e846176e4247a7e21783bf8f2f38576257/dataset/semkitti_testset.py#L26 Such as 4071 in seq 08. Just infer two time. Rough but effective and not impact on accuracy in my tests Once is self.data_list = sorted(self.data_list)[0:3000]. Then ifer again in self.data_list = sorted(self.data_list)[3000:] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. 从QQ邮箱发来的超大附件 mlpmixer_basic.zip (134.63M, 2022年01月18日 16:07 到期)进入下载页面：http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=73303436f1d2f6c4f1745b091336511c515303070652025219075706031b57555008190e0255571e0d0056040652530052050d533524635e5840595f4d53116c5651475f5618195a443009&code=404656c3

Huixian Cheng · Answer 5 · Sun Dec 19 2021 17:28:37 GMT+0800 (China Standard Time)

I haven't used the original code so I can't give advice.
Also, all you need to be aware of is the error log given by codalab.
May be you can try to get help in here.

fenfenglitech · Answer 6 · Sun Dec 19 2021 17:46:15 GMT+0800 (China Standard Time)

what is your environments,i want to try run your code.

Huixian Cheng · Answer 7 · Sun Dec 19 2021 18:44:06 GMT+0800 (China Standard Time)

Just this repo with infer in "all" type.

I did not submit a test, I think if there is no problem with this api verification in valid set, the test is also no problem.

fenfenglitech · Answer 8 · Wed Dec 29 2021 23:01:32 GMT+0800 (China Standard Time)

hi,   i want to appreciate your suggestions.   my problems got solved with your help and now my work get a great score .thank you very much!!!

…

------------------ 原始邮件 ------------------ 发件人: "tsunghan-mama/RandLA-Net-pytorch" ***@***.***>; 发送时间: 2021年12月19日(星期天) 晚上6:44 ***@***.***>; ***@***.******@***.***>; 主题: Re: [tsunghan-mama/RandLA-Net-pytorch] Excessive memory requirements (#10) Just this repo with infer in "all" type. I did not submit a test, I think if there is no problem with this api verification in valid set, the test is also no problem. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: ***@***.***>

sugar · Answer 9 · Wed Jan 12 2022 10:44:24 GMT+0800 (China Standard Time)

hi, @huixiancheng,i have run data_prepare_semantickitti.py successfully, but when i train the model it was wrong, the error is: RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19], how can i do?

Huixian Cheng · Answer 10 · Wed Jan 12 2022 15:20:33 GMT+0800 (China Standard Time)

Hi, I do not meet this problem. Maybe You should check the number of classes and classes_weights.

Here is the weight I ever caculated and used.

class_weights = torch.tensor([[17.1775, 49.4507, 49.0822, 45.9186, 44.9319, 49.0655, 49.6848, 49.8644,
5.3651, 31.3473, 7.2697, 41.0090, 5.5935, 11.1401, 2.8727, 37.3551,
9.1705, 43.3172, 48.0677]]).cuda()

It really a tensor of shape: torch.Size([1, 19]).

sugar · Answer 11 · Wed Jan 12 2022 20:52:29 GMT+0800 (China Standard Time)

@huixiancheng, thank you very much for your data and advice, i try it but still can not work. Do you think maybe this problem has relation with checkpoint.rar? because i can't gei it from your link in readme.md. it was empty.

Huixian Cheng · Answer 12 · Fri Jan 14 2022 12:38:55 GMT+0800 (China Standard Time)

No. I think it will not effect.

sugar · Answer 13 · Mon Jan 17 2022 08:56:05 GMT+0800 (China Standard Time)

@huixiancheng i am very grateful for you give me advices, i will try it again, thank you very much

Huixian Cheng · Answer 14 · Wed Feb 09 2022 15:32:56 GMT+0800 (China Standard Time)

@xlr-project Maybe you use torch=1.10? I just reprodece your errors with this setting(torch=1.10 with cuda=11.3 ). When change to torch=1.81 and cuda=11.1. It work well.

sugar · Answer 15 · Sun Mar 20 2022 16:16:48 GMT+0800 (China Standard Time)

@xlr-project Maybe you use torch=1.10? I just reprodece your errors with this setting(torch=1.10 with cuda=11.3 ). When change to torch=1.81 and cuda=11.1. It work well.

thank you very much, i make it successfully already