Reproducing Numbers

Question

Reproducing Numbers

Seleucia opened this issue 2 years ago · comments

Hello,

Thanks a lot for releasing the code. I'm having trouble to reproduce numbers reported in paper. I'm using your pre-trained models. However i can't get the same numbers that you are reporting:

ResNet-50 + PeCLR
Evaluation 3D KP results:
auc=0.357, mean_kp3d_avg=4.71 cm
Evaluation 3D KP ALIGNED results:
auc=0.860, mean_kp3d_avg=0.71 cm

As you describe, i'm loading your model as following:


import torch
import torchvision.models as models
# For ResNet-50
rn50 = models.resnet50()
peclr_weights = torch.load('peclr_rn50_yt3dh_fh.pth')
rn50.load_state_dict(peclr_weights['state_dict'])
# For ResNet-152
rn152 = models.resnet152()
peclr_weights = torch.load('peclr_rn152_yt3dh_fh.pth')
rn152.load_state_dict(peclr_weights['state_dict'])

And then I'm calling "evaluate" function in the evaluation_utils.py file. I'm evaluating on the FH dataset "test" set. Do you have any other code snippet or something to use evaluation besides evaluation_utils.py. There have been some bugs in this file.

Adrian Spurr · Answer 1 · Thu Feb 10 2022 00:13:42 GMT+0800 (China Standard Time)

These numbers are from the official FH test set and acquired from the codalab site: https://competitions.codalab.org/competitions/21238

The evaluate function in evaluation_utils.py is used to evaluate on data we have ground-truth for, which is not the case for the test set.

Seleucia · Answer 2 · Thu Feb 10 2022 00:25:18 GMT+0800 (China Standard Time)

Thanks for reply. FH released the GT for test set as well. I'm using GT from official website. Do you have any other code which might help to generate same results.
I download GT from here: https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html

Adrian Spurr · Answer 3 · Thu Feb 10 2022 03:02:04 GMT+0800 (China Standard Time)

I did not know that. Thanks for updating me on this matter. I need to check how we produce the results for codalab which may take some time as I currently do not have access to the computer. In the mean time, I would double check on your end that you are evaluating in the exact same manner as FH does on codalab.

Seleucia · Answer 4 · Sun Feb 13 2022 20:14:42 GMT+0800 (China Standard Time)

How do you obtain 3D pose predictions for evaluation of the FH dataset:
I'm following this strategy:
I'm ensuring that i'm not applying any data augmentation.
I'm using "prepare_supervised_sample" function in data_set file to get samples (images and GT).

After forward pass in model, i'm calling following function:
predictions_3d = convert_2_5D_to_3D(predictions, scale, camera_param, True)
convert_2_5D_to_3D is in src.data_loader.utils. Input parameters are:

scale and camera_param is obtained from "prepare_supervised_sample"
predictions "kp25d" are coming from forward function in src.models.rn_25D_wMLPref

I also try other options:

I used directly kp3d from src.models.rn_25D_wMLPref, however this also did not produce any meaningful result.
I use scale value to "kp3d" from src.models.rn_25D_wMLPref, but this also did not get any meaningful result.

Do you use_palm True or False? I see default value is false. I tried both options, when i set it True and run the code I'm getting a better results. Here is the my best results:

{'Mean_EPE_2D': tensor(13.1927), 'Median_EPE_2D': tensor(9.0325), 'Mean_EPE_3D': tensor(0.4832), 'Median_EPE_3D': tensor(0.3617), 'Median_EPE_3D_R_V_3D': tensor(1.2209e-07), 'AUC': 0.3777584816151982, 'Mean_EPE_3D_procrustes': tensor(0.0230), 'Median_EPE_3D_procrustes': tensor(0.0194), 'auc_procrustes': 0.9536341087054506}

Here is the Full Code:
Loading model:

from src.models.rn_25D_wMLPref import RN_25D_wMLPref
# For RN50
model_type = 'rn50'
model = RN_25D_wMLPref(backend_model=model_type)
model_path = f'{model_type}_peclr_yt3d-fh_pt_fh_ft.pth'
full_path=os.path.join(BASE_DIR,'data','models',model_path)
checkpoint = torch.load(full_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()

Obtaining Data loaders

experiment_type = "supervised"
#args.sources : only freihand
#train_param :  = edict(read_json(TRAINING_CONFIG_PATH))  # reading json file. 
data_test = get_data(
            Data_Set, train_param, sources=args.sources, experiment_type=experiment_type,split='test'
        )
test_data_loader, _ = get_train_val_split(
    data_test, batch_size=train_param.batch_size, num_workers=train_param.num_workers
)

I'm running following code:

import import src.experiments.evaluation_utils as eup
output= eup.evaluate(model,test_dataloader,use_procrustes=True)

I changed following functions in order to run the code properly:

get_labels: I ensure that test split load "evaluation_xyz.json" file.
get_predictions_and_ground_truth: RN_25D_wMLPref returns dictionary, however code assumes it return directly pose, i ensure that we access required predictions from model return.
getitem function in freihand_loader file. Older version of this function was returning dummy values for test split. I modified this function to ensure that test split also return correct values. Basically, i changed "else" section in line, final version look like that:

camera_param = torch.tensor(self.camera_param[idx_]).float()
joints3D = self.joints.freihand_to_ait(
                torch.tensor(self.labels[idx_]).float()
            )

Seleucia · Answer 5 · Mon Feb 28 2022 21:03:08 GMT+0800 (China Standard Time)

Hello @spurra ,
did you have time to have a look above code? I would be really appreciate if you can check.

Marian Petruk · Answer 6 · Fri Mar 18 2022 20:55:05 GMT+0800 (China Standard Time)

Dear @spurra @dahiyaaneesh @einer @xucong-zhang

I stumbled upon similar problems. Could you please help understand how to properly evaluate and obtain the quantitative results you published?

These model weights achieve the following performance on the FreiHAND leaderboard:

ResNet-50 + PeCLR
Evaluation 3D KP results:
auc=0.357, mean_kp3d_avg=4.71 cm
Evaluation 3D KP ALIGNED results:
auc=0.860, mean_kp3d_avg=0.71 cm

ResNet-152 + PeCLR
Evaluation 3D KP results:
auc=0.360, mean_kp3d_avg=4.56 cm
Evaluation 3D KP ALIGNED results:
auc=0.868, mean_kp3d_avg=0.66 cm

I would greatly appreciate if you could update the repository with evaluation steps to obtain declared metrics.

Thank you and looking forward to your reply.

Adrian Spurr · Answer 7 · Tue Mar 22 2022 00:56:38 GMT+0800 (China Standard Time)

I apologize for the delay in responding to this. We plan on releasing the code which produces the predictions for codalab this week.

Adrian Spurr · Answer 8 · Sun Mar 27 2022 06:58:17 GMT+0800 (China Standard Time)

Hi all, thank you for your patience in this matter. It's been an intense week at my internship which is why I only got to this task this weekend. I went over the prediction code and it reproduces the numbers we originally report. It is committed and ready to be pushed to the repo. As the code base is a heavily modified version of FH github code base, I am awaiting permission from the respective authors to upload the prediction code. Once I receive that, I'll push the code.

Adrian Spurr · Answer 9 · Thu Mar 31 2022 09:57:51 GMT+0800 (China Standard Time)

I have received permission. The code has been pushed. Please let me know if you can reproduce the numbers of the leaderboard.

Leo · Answer 10 · Mon Feb 19 2024 17:05:56 GMT+0800 (China Standard Time)

So, how to reproduce the numbers? Thank you for your help!