About outputting as nan

Question

About outputting as nan

zhangziyi1670 opened this issue a year ago · comments

I encountered a problem during the process of reproducing this code. In the second training stage, the output of the old model for new data was nan. I debugged the code and found that the distribution of the model's output data was significantly different, ultimately leading to numerical overflow. May I ask what tricks were used in your implementation process to avoid such issues.