Large standard deviation when reproducing experiment results
xuyxu opened this issue · comments
Has anyone tried this model on benchmark datasets like Arrhythmia or Thyroid ?
I use ten different seeds [000, 111, 222, ..., 999], and evaluate the performance of DAGMM (Structure of autoencoder, learning rate, batch size, are exactly the same). Below is the AUC and Precision results on Thyroid:
AUC: 0.5562 0.5546 0.9403 0.9439 0.5592 0.6733 0.9156 0.7703 0.6353 0.8264
Precision: 0.0968 0.0108 0.6129 0.4301 0.0538 0.3226 0.4731 0.2366 0.1505 0.2366
It is clear that three precision records are close to the one reported in raw paper, even better. However, the standard deviation over 10 independent trials is quite large...
I'm not sure whether there is something wrong with my experiment code, or the model is inherently unstable.
Therefore, I would like to ask that has anyone else also observed such large standard deviation.
Thanks :-)