The performance of Pretrained-3D-ViSNet

Question

The performance of Pretrained-3D-ViSNet

PierreHao opened this issue 2 years ago · comments

Hi, thank you for your good job ViSNet.
I have run your code with the model Pretrained_3D_ViSNet_ckpt_0.ckpt and Pretrained_3D_ViSNet_ckpt_1.ckpt on the valid set, but the performance is only about 0.091, which means the distilling works badly? Is there something wrong with the performance 0.091?

Shaoning Li · Answer 1 · Thu Dec 01 2022 18:03:21 GMT+0800 (China Standard Time)

Hi PierreHao,

Thanks for your interest. It is true that Pretrained-3D-ViSNet performs worse though it performs really well on the valid/test set split from the train set (equilibrium structures). The reason is that ViSNet is sensitive to the given structures and the generated 3D structures from RD-Kit are not appropriate. For instance, we generate 3D structures by RD-Kit for the train set and also train a ViSNet with the same settings and splitting. However, it only achieved 0.083 (RD-Kit) compared with 0.0216 (equilibrium) reported in the technical report. But due to time and resources limitation, we cannot adopt other strategies and think that it might be helpful for the final ensemble. So if you have some insights for improving the pretrained methods, please contact us and we would like to help. Thanks!

Best,
Shaoning

PierreHao · Answer 2 · Thu Dec 01 2022 20:01:24 GMT+0800 (China Standard Time)

Hi PierreHao,

Thanks for your interest. It is true that Pretrained-3D-ViSNet performs worse though it performs really well on the valid/test set split from the train set (equilibrium structures). The reason is that ViSNet is sensitive to the given structures and the generated 3D structures from RD-Kit are not appropriate. For instance, we generate 3D structures by RD-Kit for the train set and also train a ViSNet with the same settings and splitting. However, it only achieved 0.083 (RD-Kit) compared with 0.0216 (equilibrium) reported in the technical report. But due to time and resources limitation, we cannot adopt other strategies and think that it might be helpful for the final ensemble. So if you have some insights for improving the pretrained methods, please contact us and we would like to help. Thanks!

Best, Shaoning

Got it, thanks for your reply. I have done the distilling with graphormer, and got the result 0.0806 on valid set. All the details are in the paper here https://arxiv.org/abs/2211.16712 .

Shaoning Li · Answer 3 · Thu Dec 01 2022 20:44:18 GMT+0800 (China Standard Time)

Great work! The Graphormer architecture is better for distillation because it can be adopted to both 2D and 3D graphs. Though current EGNNs have better results for 3D structures, they cannot be directly adopted to 2D graphs and hence are hard for distillation. Moreover, you can try the pre-training methods in 3DInfomax because I think a unified structure (graph transformer in your paper) is better.

Shaoning Li · Answer 4 · Thu Dec 01 2022 20:52:32 GMT+0800 (China Standard Time)

As 3DInfomax mentioned, "Our pre-training can also be understood from a contrastive distillation perspective where the student 2D network learns from the teacher 3D network to produce 3D information". And the pre-trained model can be fine-tuned for various downstream datasets/tasks (GEOM-DRUG or QM9).

PierreHao · Answer 5 · Fri Dec 02 2022 10:18:28 GMT+0800 (China Standard Time)

As 3DInfomax mentioned, "Our pre-training can also be understood from a contrastive distillation perspective where the student 2D network learns from the teacher 3D network to produce 3D information". And the pre-trained model can be fine-tuned for various downstream datasets/tasks (GEOM-DRUG or QM9).

Thank you for your suggestion. 3DInformax, GraphMVP and UnifiedMol are also great works, I have done some experiments with these methods, and maybe it needs more techniques to improve the result.
Anyway, thanks for your reply and your ViSNet.