LTLA / BiocSingular

Clone of the Bioconductor repository for the BiocSingular package.

Home Page:https://bioconductor.org/packages/devel/bioc/html/BiocSingular.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

runSVD with RandomParam() returns inverted values

pablo-rodr-bio2 opened this issue · comments

I was trying to use runSVD() with RandomParam() on a very large dataset in a HDF5Array. Before that, I did some tests to see how could values change between this and base::svd(), but it turns out everytime I use RandomParam() I get results on the first column of $u and $v with its values inverted, don't know if this is intended.

> library(BiocSingular)
> set.seed(123)
> m <- matrix(sample.int(10, 25, T), 10, 10)
> m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    5    9    5    3    3    5    9    5     3
 [2,]    3    3    3    4    8    3    3    3    4     8
 [3,]   10    9    4    6   10   10    9    4    6    10
 [4,]    2    9    1    9    7    2    9    1    9     7
 [5,]    6    9    7   10   10    6    9    7   10    10
 [6,]    5    3    3    5    9    5    3    3    5     9
 [7,]    4    8    3    3    3    4    8    3    3     3
 [8,]    6   10   10    9    4    6   10   10    9     4
 [9,]    9    7    2    9    1    9    7    2    9     1
[10,]   10   10    6    9    7   10   10    6    9     7
> gSetIdx <- 1:2

> x1 <- svd(m[gSetIdx, ])
> x2 <- runSVD(m[gSetIdx, ], k=2)
> x3 <- runSVD(m[gSetIdx, ], k=2, BSPARAM=RandomParam())

This are the results I get:

> x1
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x2
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x3
$d
[1] 21.227029  7.836661

$u
          [,1]       [,2]
[1,] 0.7796929 -0.6261621
[2,] 0.6261621  0.7796929

$v
           [,1]         [,2]
 [1,] 0.1986884  0.058774061
 [2,] 0.2721507 -0.101029225
 [3,] 0.4190753 -0.420635796
 [4,] 0.3016490 -0.001536228
 [5,] 0.3461801  0.556239043
 [6,] 0.1986884  0.058774061
 [7,] 0.2721507 -0.101029225
 [8,] 0.4190753 -0.420635796
 [9,] 0.3016490 -0.001536228
[10,] 0.3461801  0.556239043

The values of x3$u[,1] and x3$v[1,] are inverted.

> sessionInfo()
R Under development (unstable) (2020-10-29 r79387)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/bort/R-devel/lib/libRblas.so
LAPACK: /home/bort/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocSingular_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           rsvd_1.0.3           lattice_0.20-41     
 [4] matrixStats_0.57.0   IRanges_2.25.6       grid_4.1.0          
 [7] stats4_4.1.0         irlba_2.3.3          S4Vectors_0.29.6    
[10] Matrix_1.3-0         BiocParallel_1.25.2  beachmat_2.7.5      
[13] DelayedArray_0.17.7  MatrixGenerics_1.3.0 parallel_4.1.0      
[16] compiler_4.1.0       BiocGenerics_0.37.0

This is not a problem; the sign of the singular vectors is not identifiable. If we were to reconstruct the matrix from the decomposition, you would see that you get the same result as the negatives cancel out:

library(BiocSingular)
set.seed(123)
m <- matrix(sample.int(10, 25, T), 10, 10)

x2 <- runSVD(m[1:2, ], k=2)
x3 <- runSVD(m[1:2, ], k=2, BSPARAM=RandomParam())

# Both these things give me the same result:
x2$u %*% diag(x2$d) %*% t(x2$v)
x3$u %*% diag(x3$d) %*% t(x3$v)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    3    5    9    5    3    3    5    9    5     3
## [2,]    3    3    3    4    8    3    3    3    4     8

Oh, I see, sorry for the issue then, closing it