hongfz16 / HCMoCo

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the setting

zhihou7 opened this issue · comments

Hi, thanks for your interesting work.
I am confused about the setting. Does the setting use the same training data with HCMoCo for down-stream tasks? I mean are there any difference between the pre-training modalities and down-stream modality? Maybe, I miss something in the paper. But I do not find an apparent introduction. I might not understand this description well.

To evaluate HCMoCo, we transfer our pre-train model to four human-centric downstream tasks using different modalities,

Thank you for your interest in our work.

HCMoCo uses RGB, depth and 2d keypoints for pre-train. And we transfer the pre-trained RGB backbone to DensePose prediction and RGB human parsing. We transfer the pre-trained depth backbone to depth human parsing and depth 3d skeleton prediction.

The modalities are the same. But for some down-stream tasks like DensePose estimation and RGB human parsing, we use different datasets (MPII and NTURGBD for pre-train while COCO or Human3.6M for down-stream evaluation) which brings domain gap.

I hope the above explanation clarifies your confusion.

Thanks for your reply. I get it.