catavallejos / BASiCS

BASiCS: Bayesian Analysis of Single-Cell Sequencing Data. This is an unstable experimental version. Please see http://bioconductor.org/packages/BASiCS/ for the official release version

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BASiCS_TestDE after BASiCS_Filter - different amount of genes

binyaminZ opened this issue · comments

Hi!
When I run separately BASiCS_Filter on two samples I have, and eventually I want to compare them with BASiCS_TestDE, I get MCMC chains with different amounts of genes and BASiCS_TestDE gives an error. How can I bypass this problem, e.g. by modifying the chain objects to include only the overlapping set of genes?

Thanks
Binyamin

Hi Binyamin, you can use the subset method for this. You'll need to install the latest version from github, because I've just patched a few minor bugs in the method.

You can install the github version by running:

devtools::install_github("catavallejos/BASiCS")

This example shows how to use the subset method, please let me know if anything's unclear.

library("BASiCS")

set.seed(42)
## simulate some toy data
Data <- makeExampleBASiCS_Data()
## randomly assign 1/2 genes to two subsets
Data1 <- Data[sample(nrow(Data), 10), ]
Data2 <- Data[sample(nrow(Data), 10), ]
## run MCMC on the two subsets

Chain1 <- BASiCS_MCMC(
    Data1, N = 50, Thin = 2, Burn = 10,   Regression = TRUE,
    PrintProgress = FALSE, WithSpikes = FALSE
)

Chain2 <- BASiCS_MCMC(
    Data2, N = 50, Thin = 2, Burn = 10,   Regression = TRUE,
    PrintProgress = FALSE, WithSpikes = FALSE
)

## find overlapping genes
genes <- intersect(rownames(Data1), rownames(Data2))
## list the overlapping genes (here just 1 gene)
genes
#> [1] "Gene44"
## subset both chains to just overlapping genes
Chain1 <- subset(Chain1, Genes = genes)
Chain2 <- subset(Chain2, Genes = genes)
Chain1
#> An object of class BASiCS_Chain
#>  20 MCMC samples.
#>  Dataset contains 1 biological genes and 30 cells (1 batch). 
#>  Object stored using BASiCS version:  2.7.4 
#>  Parameters:  mu delta s nu theta beta sigma2 epsilon RefFreq RBFLocations
Chain2
#> An object of class BASiCS_Chain
#>  20 MCMC samples.
#>  Dataset contains 1 biological genes and 30 cells (1 batch). 
#>  Object stored using BASiCS version:  2.7.4 
#>  Parameters:  mu delta s nu theta beta sigma2 epsilon RefFreq RBFLocations

Hi Alan,
While trying to compare two chains I have with BASiCS_TestDE, (after subsetting as above), I get the following error:

Error in HiddenHeaderTest_DE(Chain1 = Chain1, Chain2 = Chain2, EpsilonM = EpsilonM,  : 
  The  'BASiCS_Chain' objects contain genes in different order.

I checked, and indeed I get the same genes in both chains, but the order differs (obviously because of the count matrices, which were ordered differently). Could you please advise how to reorder the chains without rerunning BASiCS_MCMC?
Thanks!
Binyamin

Hi Binyamin, did you try re-ordering the chains with subset? Something like the code above:

## find overlapping genes
genes <- intersect(rownames(Chain1), rownames(Chain1))
## list the overlapping genes (here just 1 gene)
genes
## [gene names here...]
## subset both chains to just overlapping genes
Chain1 <- subset(Chain1, Genes = genes)
Chain2 <- subset(Chain2, Genes = genes)
BASiCS_TestDE(Chain1, Chain2)

Hmm. Could you run this code and let me know the output?

all.equal(rownames(Chain1), rownames(Chain2))
all.equal(colnames(displayChainBASiCS(Chain1, "mu")), colnames(displayChainBASiCS(Chain2, "mu")))
all.equal(colnames(displayChainBASiCS(Chain1, "delta")), colnames(displayChainBASiCS(Chain2, "delta")))
all.equal(colnames(displayChainBASiCS(Chain1, "epsilon")), colnames(displayChainBASiCS(Chain2, "epsilon")))
> all.equal(colnames(displayChainBASiCS(MCMC_chains$MEF_7, "mu")),
            colnames(displayChainBASiCS(MCMC_chains$mESC_0, "mu")))
[1] "1208 string mismatches"
> all.equal(sort(colnames(displayChainBASiCS(MCMC_chains$MEF_7, "mu"))),
            sort(colnames(displayChainBASiCS(MCMC_chains$mESC_0, "mu"))))
[1] TRUE

same for delta and epsilon

Ah, I'm really sorry about this Binyamin. The current behaviour is to return the same features, but not necessarily in the same order. See #243

I have just pushed a fix. You can again install the dev version with:

devtools::install_github("catavallejos/BASiCS")

My apologies. And hope you have a good weekend