ramhiser / sparsediscrim

Sparse and Regularized Discriminant Analysis in R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bizarre Results with SDLDA, SDQDA, and RSDDA

ramhiser opened this issue · comments

Need to double-check that these are implemented correctly. Write unit tests to verify that they work correctly for simple examples with a small number of alphas.

First of all, one reason why the bizarre results were occurring is because the grid size for the candidate alphas is defaulted to 5. This is insufficient. Because the risk can be computed very quickly, perhaps the default should be raised to 25-100.

Also, I have written unit tests to test DLDA, DQDA, SDLDA, SDQDA, and RSDDA on Pang et al.'s (2009) simulations A and B. The covariance structure is simply the identity, but their online supplementary materials have the average error rates after 500 replications. The unit tests will test if the numbers match up within a tolerance of 0.02. This may need to be changed because it is perhaps too strict and the standard error of Pang et al.'s (2009) test error rates is abnormally large, due to their limited test data size.

The classifiers DLDA, DQDA and SDLDA were all within .04 of the error rates reported by Pang et al. (2009). The SDQDA classifier is as well for the larger sample sizes n_k = 8, 10, and 15. However, the error rates are significantly different when n_k = 4 and 5. This possibly suggests that there is a slight error in my code. This would also be reflected in the implementation of RSDDA.

The cause of these issues was in the Stein risk functions. In particular, I was miscalculating the pooled sample variance. I had used (1/p) rather than the correct (t/p), which was is completely incorrect since the default value of t is -1. This had the effect of dividing through by the precision rather than the variance. This explains the significant differences when small sample sizes were being used. However, in Pang's simulations the effect was not too severe because the true covariance matrix is the identity matrix.

I have run the simulations for SDQDA, and the results concur with Pang's original results. Tonight, I will run the same simulations for RSDDA. If they check out, then the issue should be resolved.