"The learnable temperature parameter was clipped to prevent scaling the logits by more than 100 which we found necessary to prevent training instability."
About
PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k