Question About bad results from trained model

Question

Question About bad results from trained model

Alison-starbeat opened this issue a year ago · comments

Sorry to bother you! I'm new to NLP, and I tried to use unsupervised simCSE on my own data, and the goal is to achieve best recall scores and precise scores (there is a test dataset) on my own data. I tried to use 10,000 - 90,000 data,1-2 epoches,learning_rate of 1e-5,with batch_size of 64 for training, and use a base model(roformer-sim Chinese version). But I found that the results from the trained model was worse than the base model.

I guess that questions happens in my datasets, maybe my dataset concludes lots of similiar sentence pairs naturally, which may causes bad influence to the contrastive learning step. Could this be true? What could I do to improve the results?

Thank you for your patience and hope for your reply!

Tianyu Gao · Answer 1 · Thu May 18 2023 04:00:12 GMT+0800 (China Standard Time)

Hi, can you elaborate more on the issue? For example, what this dataset is about, what the baseline model is, etc.

github-actions · Answer 2 · Sat Jun 17 2023 17:20:12 GMT+0800 (China Standard Time)

Stale issue message