ramhiser / sparsediscrim

Sparse and Regularized Discriminant Analysis in R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

predicting a single sample

topepo opened this issue · comments

Using version 0.2 I have issues when predicting a new data set with a single row:

library(caret)

set.seed(1)
dat <- twoClassSim(101)
trn <- dat[1:100,]
tst <- dat[101,]


library(sparsediscrim)

mod <- hdrda(x = as.matrix(trn[, -ncol(trn)]), y = trn$Class)

predict(mod, newdata = as.matrix(tst[, -ncol(tst)]))

with

predict(mod, newdata = as.matrix(trn[1:5, -ncol(tst)]))
$class
[1] Class1 Class1 Class1 Class1 Class2
Levels: Class1 Class2

$scores
     Class1   Class2
1  9.539882 13.34303
2 15.849269 27.26078
3 22.623988 27.86927
4 19.998993 22.87425
5 26.780945 12.71985

$posterior
        Class1       Class2
1 1.000000e+00 2.230046e-02
2 1.000000e+00 1.106739e-05
3 1.000000e+00 5.272328e-03
4 1.000000e+00 5.640160e-02
5 7.822473e-07 1.000000e+00

This examples throws an error "Error in which.min(scores) : (list) object cannot be coerced to type 'double'".

In other cases (data not available) it gives posteriors that don't add to one or results with >1 dimension:

Browse[2]> predict(modelFit, newdata)
$class
[1] Class1
Levels: Class1 Class2

$scores
  Class1   Class2 
2.345889 2.427533 

$posterior
Class1 Class2 
1.0000 0.9216 

and:

predict(modelFit, newdata)
$class
 [1] Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2 Class2
Levels: Class1 Class2

$scores
        Class1   Class2
 [1,] 2.427533 2.345889
 [2,] 1.427533 1.345889
 [3,] 1.427533 1.345889
 [4,] 1.427533 1.345889
 [5,] 1.427533 1.345889
 [6,] 2.427533 2.345889
 [7,] 1.427533 1.345889
 [8,] 1.427533 1.345889
 [9,] 1.427533 1.345889
[10,] 1.427533 1.345889
[11,] 2.427533 2.345889
[12,] 1.427533 1.345889
[13,] 1.427533 1.345889
[14,] 1.427533 1.345889
[15,] 1.427533 1.345889
[16,] 2.427533 2.345889

$posterior
      Class1 Class2
 [1,] 0.9216      1
 [2,] 0.9216      1
 [3,] 0.9216      1
 [4,] 0.9216      1
 [5,] 0.9216      1
 [6,] 0.9216      1
 [7,] 0.9216      1
 [8,] 0.9216      1
 [9,] 0.9216      1
[10,] 0.9216      1
[11,] 0.9216      1
[12,] 0.9216      1
[13,] 0.9216      1
[14,] 0.9216      1
[15,] 0.9216      1
[16,] 0.9216      1

Thanks,

Max

Thanks for letting me know, @topepo. Missed your issue somehow. I'll take a look right now.

In the latest version (0.2.2) on master, the error with predicting a single sample is no longer present. A couple of issues still remain:

  1. The posterior probabilities do not sum to 1
  2. The class names are renamed when predicting a single sample.

I'm looking into both issues.

Side note: the latest version of sparsediscrim on CRAN is 0.2. I'll update it on CRAN after the fix.

> library(caret)
> set.seed(1)
> dat <- twoClassSim(101)
> trn <- dat[1:100,]
> tst <- dat[101,]
> mod <- hdrda(x = as.matrix(trn[, -ncol(trn)]), y = trn$Class)
> predict(mod, newdata=trn[1, -ncol(tst)])
 $class
 [1] Class1
 Levels: Class1 Class2

 $scores
  Class1.1  Class2.1
  9.539882 13.343029

 $posterior
   Class1.1   Class2.1
 1.00000000 0.02230046

Thanks for reporting the issue, @topepo. Resolved.

I'll push to CRAN soon.

Thanks for the fix.

Max