why mlp_only_train=True during unsupervised training?

Question

why mlp_only_train=True during unsupervised training?

lankuohsing opened this issue 7 months ago · comments

I have noticed this sugsestion in README:
--mlp_only_train: We have found that for unsupervised SimCSE, it works better to train the model with MLP layer but test the model without it. You should use this argument when training unsupervised SimCSE models.

Given pooler_type=='cls' and mlp_only_train==True, the embedding for testing during unsupervised training will not include the mlp transformation as indicated by the code in models.py(line 262, 263):

if cls.pooler_type == "cls" and not cls.model_args.mlp_only_train:
        pooler_output = cls.mlp(pooler_output)

However, if I test my model(saved after unsupervised training and converted to huggingface checkpoint by simcse_to_huggingface.py) by using evaluation.py, the embedding will include mlp transformation (given pooler_type=='cls'), as indicated by the code in evaluation.py（line 119 to line 122） :

# Apply different poolers
        if args.pooler == 'cls':
            # There is a linear+activation layer after CLS representation
            return pooler_output.cpu()

The pooler_output includes the MLP transformation because we have renamed 'mlp' to 'pooler' in simcse_to_huggingface.py):

if "mlp" in key:
            key = key.replace("mlp", "pooler")

Why is there a difference in using embeddings for testing during unsupervised training and for formal evaluation?

Xingcheng Yao · Answer 1 · Fri Dec 08 2023 23:23:51 GMT+0800 (China Standard Time)

Hi, sorry for the confusion. The code in models.py(line 262, 263) only affects the validation process during training as shown in trainer.py. To make sure mlp transformation is not applied using evaluation.py, you should set pooler to 'cls_before_pooler' in the evaluation script as opposed to 'cls' in the training script.

Guoxing Lan · Answer 2 · Sat Dec 09 2023 16:36:46 GMT+0800 (China Standard Time)

Hi, sorry for the confusion. The code in models.py(line 262, 263) only affects the validation process during training as shown in trainer.py. To make sure mlp transformation is not applied using evaluation.py, you should set pooler to 'cls_before_pooler' in the evaluation script as opposed to 'cls' in the training script.

thanks!