Parameter tuning (scale_s)

Question

Parameter tuning (scale_s)

tengfeixue-victor opened this issue 2 years ago · comments

Thanks for your interesting work!

I have a few questions about parameter tuning. In my understanding, the "scale_s" can be taken as the reciprocal of temperature (tau) in the supervised contrastive learning paper (Khosla et al.), right?
In your code, you set the scale_s as 150, which seems to be very large. Any suggestions for selecting appropriate values of scale_s? You said "usually determined empirically" in your paper.
Also, should people still tune the temperature in your code? I feel it is better to set it as 1 and only tune scale_s.

Looking forward to your suggestion. Thanks!

ackbar03 · Answer 1 · Thu May 26 2022 17:16:47 GMT+0800 (China Standard Time)

hi victor,

Thanks for your comment and your interest in this work!

I have a few questions about parameter tuning. In my understanding, the "scale_s" can be taken as the reciprocal of temperature (tau) in the supervised contrastive learning paper (Khosla et al.), right?

You are right. And yes, you are also correct in that it is a bit tricky to determine. If I remember correctly I think Khosla or another work tried to provide some analysis on the tau parameter but there was no clean analytical relationship. The general idea I think was that it contributes somewhat to both the rate of learning and the sensitivity to differentiating positive and negative samples.

Any suggestions for selecting appropriate values of scale_s? You said "usually determined empirically" in your paper.

If I remember correctly, I used high levels of scale_s because higher levels of sensitivity to contrastive differentiation tends to help more for regression tasks.

As an example, if you have 10k training samples and the labels range between 1-100, you want your contrastive learning model to differentiate samples with labels 50 and 50.1 very well. Otherwise the usefulness to the model training will not be very high. However, setting scale_s too high will lead to numerical instability. If you have 100 training samples however, and the labels range between 1-100, the sensitivity probably does not need to be as high as your training labels tend to be more spread out.

For our experiments, the LVEF dataset had around 7k training samples with labels somewhere between 0-100% and we used scale_s = 150. For the BMD dataset, there was around 600 crops with labels between 0-1 and we used scale_s of 50.

Also, should people still tune the temperature (do you mean weight?) in your code? I feel it is better to set it as 1 and only tune scale_s.

As mentioned above, scale_s tends to affect both training weight and sensitivity, so the two effects are not exactly the same. My suggestion would be to choose a scale_s value depending on your dataset size. If you have a large training sample size, you may want to use a larger scale_s. If you have a smaller sample, maybe a smaller scale_s. Hopefully the parameters I used may help guide a bit (150 for 7k training size, 50 for 600 training size). Then, you can experiment with different weight parameters and fine-tune the scale_s if you have time.

Unfortunately I did not have time to explore the temperature parameter in detail in my work but hopefully this helps somewhat.

Tengfei Xue · Answer 2 · Thu May 26 2022 23:47:48 GMT+0800 (China Standard Time)

Hi,

Wow, thanks so much for your detailed reply! That's very helpful. Appreciate it!

Scale_s (explained above) and the weight of contrastive loss (explained in your paper after Eq. 8) are two most important hyper-parameters for this method, right? The temperature can be tuned as well to see if it improves the results. In addition, learning rate, batch size, etc also matter for all the deep learning in general.

Looking forward to your suggestions. Thanks again!

Regards,
Tengfei

ackbar03 · Answer 3 · Mon May 30 2022 13:01:03 GMT+0800 (China Standard Time)

Scale_s (explained above) and the weight of contrastive loss (explained in your paper after Eq. 8) are two most important hyper-parameters for this method, right? The temperature can be tuned as well to see if it improves the results.

I'm not sure what you refer to by the temperature parameter. The scale_s parameter is basicall the inverse of the temperature parameter typically used in cross entropy. Otherwise, yes, scale_s and weight of contrastive loss is most important

In addition, learning rate, batch size, etc also matter for all the deep learning in general.

Yes, but this is true for any learning algorithm. You can follow the settings of whatever implementation you are trying to improve upon to avoid spending time fine-tuning.

Tengfei Xue · Answer 4 · Tue May 31 2022 06:14:44 GMT+0800 (China Standard Time)

Got it! Thanks so much for your explanation!