Bizarre Results with SDLDA, SDQDA, and RSDDA

Question

Bizarre Results with SDLDA, SDQDA, and RSDDA

ramhiser opened this issue 13 years ago · comments

Need to double-check that these are implemented correctly. Write unit tests to verify that they work correctly for simple examples with a small number of alphas.

John Ramey · Answer 1 · Fri Mar 18 2011 12:55:31 GMT+0800 (China Standard Time)

First of all, one reason why the bizarre results were occurring is because the grid size for the candidate alphas is defaulted to 5. This is insufficient. Because the risk can be computed very quickly, perhaps the default should be raised to 25-100.

Also, I have written unit tests to test DLDA, DQDA, SDLDA, SDQDA, and RSDDA on Pang et al.'s (2009) simulations A and B. The covariance structure is simply the identity, but their online supplementary materials have the average error rates after 500 replications. The unit tests will test if the numbers match up within a tolerance of 0.02. This may need to be changed because it is perhaps too strict and the standard error of Pang et al.'s (2009) test error rates is abnormally large, due to their limited test data size.

John Ramey · Answer 2 · Sat Mar 19 2011 08:40:54 GMT+0800 (China Standard Time)

The classifiers DLDA, DQDA and SDLDA were all within .04 of the error rates reported by Pang et al. (2009). The SDQDA classifier is as well for the larger sample sizes n_k = 8, 10, and 15. However, the error rates are significantly different when n_k = 4 and 5. This possibly suggests that there is a slight error in my code. This would also be reflected in the implementation of RSDDA.

John Ramey · Answer 3 · Mon Mar 21 2011 11:52:09 GMT+0800 (China Standard Time)

The cause of these issues was in the Stein risk functions. In particular, I was miscalculating the pooled sample variance. I had used (1/p) rather than the correct (t/p), which was is completely incorrect since the default value of t is -1. This had the effect of dividing through by the precision rather than the variance. This explains the significant differences when small sample sizes were being used. However, in Pang's simulations the effect was not too severe because the true covariance matrix is the identity matrix.

I have run the simulations for SDQDA, and the results concur with Pang's original results. Tonight, I will run the same simulations for RSDDA. If they check out, then the issue should be resolved.