Some confusion about normalize_data.py

Question

Some confusion about normalize_data.py

shenhai911 opened this issue 3 years ago · comments

Dear author, during generating training set and verification set, there are three versions of normalize_data.py, which are as follows,

in tensorflow_version overlapNet: OverlapNet/src/utils/normalize_data.py

in pytorch_version overlapNet: OverlapNet/tools/utils/normalize_data.py

in overlap_localization: overlap_localization/src/prepare_training/normalize_data.py

I want to know which version you are using in the paper to keep different bins the same amount of samples.

It would be great if you could explain more. Thank you very much.

Xieyuanli Chen · Answer 1 · Mon Jan 17 2022 15:40:55 GMT+0800 (China Standard Time)

Hi @shenhai911, thanks for using our code!

The original OverlapNet paper used the first version.

The PyTorch version was implemented by other users. As far as I know, they found it also works well.

Different amounts of data will influence the training time, and the overlap distributions vary from different datasets, where the normalization can be adjusted accordingly.

shenhai911 · Answer 2 · Thu Jan 20 2022 09:08:39 GMT+0800 (China Standard Time)

Dear author,
Thank you for your quick reply. Now I have two main questions to ask.

When I use the first version of the code to generate the ground-truth corresponding to sequence 08, the following error occurred when dividing the training set and the validation set:

Traceback (most recent call last): File "src/generate_training_data/generate_ground_truth_single_sequence.py", line 110, in <module> train_data, validation_data = split_train_val(dist_norm_data) File "src/generate_training_data/../../src/utils/split_train_val.py", line 21, in split_train_val train_set, test_set = train_test_split(ground_truth_mapping, test_size=test_size) File "/home/xxx/softwares/anaconda3/envs/OverlapNet_env/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 2423, in train_test_split n_samples, test_size, train_size, default_test_size=0.25 File "/home/xxx/softwares/anaconda3/envs/OverlapNet_env/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 2046, in _validate_shuffle_split "(0, 1) range".format(test_size, n_samples) ValueError: test_size=0 should be either positive and smaller than the number of samples 2 or a float in the (0, 1) range

And I paste the related codes as follows:

I printed the histogram statistical results and the normalized results, and pasted them as follows:

I don't know if it's because of my mistake, or if it's a bug in itself. I wonder if you have encountered this problem.

In addition, in your paper, you take the sequence 03-10 as the training set and the sequence 11 as the validation set.

Therefore, I want to know, if I take the sequence 11 as the validation set, then, when generating the ground-truth, does the sequence 03-10 need to be divided into training set and validation set?

It would be great if you could explain more. Thank you very much.

Xieyuanli Chen · Answer 3 · Sun Jan 23 2022 17:47:35 GMT+0800 (China Standard Time)

Hey @shenhai911, for the OverlapNet training in the final version, we use the validation splits of all sequences instead of only one sequence to guide a more general distribution. We however still use only sequence 02 for tuning the parameters of our SLAM algorithm, and therefore still call it validation sequence. We didn't use any data from the test set, seq 11-21.

For the distribution issue of seq08, it seems the overlap values are wrong and could be caused by the inaccurate poses. I guess you are using the ground truth poses provided by KITTI, which actually has lots of noise. You may use the poses we provided in the SemanticKITTI, where we use our SLAM algorithm to refine the poses and you may get better results. download link