Large standard deviation when reproducing experiment results

Question

Large standard deviation when reproducing experiment results

xuyxu opened this issue 4 years ago · comments

Has anyone tried this model on benchmark datasets like Arrhythmia or Thyroid ?

I use ten different seeds [000, 111, 222, ..., 999], and evaluate the performance of DAGMM (Structure of autoencoder, learning rate, batch size, are exactly the same). Below is the AUC and Precision results on Thyroid:

AUC: 0.5562 0.5546 0.9403 0.9439 0.5592 0.6733 0.9156 0.7703 0.6353 0.8264
Precision: 0.0968 0.0108 0.6129 0.4301 0.0538 0.3226 0.4731 0.2366 0.1505 0.2366

It is clear that three precision records are close to the one reported in raw paper, even better. However, the standard deviation over 10 independent trials is quite large...

I'm not sure whether there is something wrong with my experiment code, or the model is inherently unstable.

Therefore, I would like to ask that has anyone else also observed such large standard deviation.

Thanks :-)