BASiCS_TestDE after BASiCS_Filter - different amount of genes
binyaminZ opened this issue · comments
Hi!
When I run separately BASiCS_Filter on two samples I have, and eventually I want to compare them with BASiCS_TestDE, I get MCMC chains with different amounts of genes and BASiCS_TestDE gives an error. How can I bypass this problem, e.g. by modifying the chain objects to include only the overlapping set of genes?
Thanks
Binyamin
Hi Binyamin, you can use the subset method for this. You'll need to install the latest version from github, because I've just patched a few minor bugs in the method.
You can install the github version by running:
devtools::install_github("catavallejos/BASiCS")
This example shows how to use the subset method, please let me know if anything's unclear.
library("BASiCS")
set.seed(42)
## simulate some toy data
Data <- makeExampleBASiCS_Data()
## randomly assign 1/2 genes to two subsets
Data1 <- Data[sample(nrow(Data), 10), ]
Data2 <- Data[sample(nrow(Data), 10), ]
## run MCMC on the two subsets
Chain1 <- BASiCS_MCMC(
Data1, N = 50, Thin = 2, Burn = 10, Regression = TRUE,
PrintProgress = FALSE, WithSpikes = FALSE
)
Chain2 <- BASiCS_MCMC(
Data2, N = 50, Thin = 2, Burn = 10, Regression = TRUE,
PrintProgress = FALSE, WithSpikes = FALSE
)
## find overlapping genes
genes <- intersect(rownames(Data1), rownames(Data2))
## list the overlapping genes (here just 1 gene)
genes
#> [1] "Gene44"
## subset both chains to just overlapping genes
Chain1 <- subset(Chain1, Genes = genes)
Chain2 <- subset(Chain2, Genes = genes)
Chain1
#> An object of class BASiCS_Chain
#> 20 MCMC samples.
#> Dataset contains 1 biological genes and 30 cells (1 batch).
#> Object stored using BASiCS version: 2.7.4
#> Parameters: mu delta s nu theta beta sigma2 epsilon RefFreq RBFLocations
Chain2
#> An object of class BASiCS_Chain
#> 20 MCMC samples.
#> Dataset contains 1 biological genes and 30 cells (1 batch).
#> Object stored using BASiCS version: 2.7.4
#> Parameters: mu delta s nu theta beta sigma2 epsilon RefFreq RBFLocations
Hi Alan,
While trying to compare two chains I have with BASiCS_TestDE
, (after subsetting as above), I get the following error:
Error in HiddenHeaderTest_DE(Chain1 = Chain1, Chain2 = Chain2, EpsilonM = EpsilonM, :
The 'BASiCS_Chain' objects contain genes in different order.
I checked, and indeed I get the same genes in both chains, but the order differs (obviously because of the count matrices, which were ordered differently). Could you please advise how to reorder the chains without rerunning BASiCS_MCMC
?
Thanks!
Binyamin
Hi Binyamin, did you try re-ordering the chains with subset
? Something like the code above:
## find overlapping genes
genes <- intersect(rownames(Chain1), rownames(Chain1))
## list the overlapping genes (here just 1 gene)
genes
## [gene names here...]
## subset both chains to just overlapping genes
Chain1 <- subset(Chain1, Genes = genes)
Chain2 <- subset(Chain2, Genes = genes)
BASiCS_TestDE(Chain1, Chain2)
Hmm. Could you run this code and let me know the output?
all.equal(rownames(Chain1), rownames(Chain2))
all.equal(colnames(displayChainBASiCS(Chain1, "mu")), colnames(displayChainBASiCS(Chain2, "mu")))
all.equal(colnames(displayChainBASiCS(Chain1, "delta")), colnames(displayChainBASiCS(Chain2, "delta")))
all.equal(colnames(displayChainBASiCS(Chain1, "epsilon")), colnames(displayChainBASiCS(Chain2, "epsilon")))
> all.equal(colnames(displayChainBASiCS(MCMC_chains$MEF_7, "mu")),
colnames(displayChainBASiCS(MCMC_chains$mESC_0, "mu")))
[1] "1208 string mismatches"
> all.equal(sort(colnames(displayChainBASiCS(MCMC_chains$MEF_7, "mu"))),
sort(colnames(displayChainBASiCS(MCMC_chains$mESC_0, "mu"))))
[1] TRUE
same for delta and epsilon
Ah, I'm really sorry about this Binyamin. The current behaviour is to return the same features, but not necessarily in the same order. See #243
I have just pushed a fix. You can again install the dev version with:
devtools::install_github("catavallejos/BASiCS")
My apologies. And hope you have a good weekend