Mixed Model and Contrast
vivek-verma202 opened this issue · comments
Hello Lukas,
Thank-you for Diffcyt.
I had a quick question about setting up contrast matrix for mixed model. I hypothesize that the cell cluster proportions depend on the condition (case-status, binary) when corrected for age and sex (fixed effect), and the date of batch processing (categorical, 4 values, random effect)
ei <- metadata(sce)$experiment_info
> head(ei)
sample_id condition sex age date n_cells
1 E23 Control Female 46 16-AUG-2019 150000
2 E25 Control Male 42 28-AUG-2019 150000
3 E26 Control Female 45 16-AUG-2019 150000
4 E28 Control Female 52 16-AUG-2019 150000
5 E29 Case Female 40 28-AUG-2019 150000
6 E30 Case Female 42 28-AUG-2019 150000
> str(ei)
'data.frame': 102 obs. of 6 variables:
$ sample_id: Factor w/ 102 levels "E23","E25","E26",..: 1 2 3 4 5 6 7 8 9 10 ...
$ condition: Factor w/ 2 levels "Control","Case": 1 1 1 1 2 2 2 1 1 2 ...
$ sex : Factor w/ 2 levels "Female","Male": 1 2 1 1 1 1 1 1 1 1 ...
$ age : num 46 42 45 52 40 42 52 52 46 62 ...
$ date : Factor w/ 4 levels "08-OCT-2019",..: 2 4 2 2 4 4 1 2 2 4 ...
$ n_cells : num 150000 150000 150000 150000 150000 150000 150000 150000 150000 150000 ...
da_formula1 <- createFormula(ei,
cols_fixed = c("condition", "age", "sex"),
cols_random = "date"
)
> da_formula1[["formula"]]
y ~ condition + age + sex + (1 | date)
contrast <- createContrast(c(0, 1, 0, 0))
> contrast
[,1]
[1,] 0
[2,] 1
[3,] 0
[4,] 0
da_res1 <- diffcyt(sce,
formula = da_formula1, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
clustering_to_use = "meta10", verbose = T
)
topTable(da_res1, format_vals = T)
> topTable(da_res1, format_vals = T)
DataFrame with 10 rows and 3 columns
cluster_id p_val p_adj
<factor> <numeric> <numeric>
1 1 0.00e+00 0.00e+00
2 2 0.00e+00 0.00e+00
3 3 0.00e+00 0.00e+00
4 4 0.00e+00 0.00e+00
5 5 0.00e+00 0.00e+00
6 6 0.00e+00 0.00e+00
7 7 0.00e+00 0.00e+00
8 8 0.00e+00 0.00e+00
10 10 1.78e-15 1.97e-15
9 9 7.95e-01 7.95e-01
Either I am about to publish a Science / Nature paper or I messed up (the probability of latter is close to 1).
Could you please let me know where am I messing up?
Also, like Toptable, is there a quick & easy way to check the effect size?
Thanks a lot!
Vivek
Just an update, I tried running the same using testDA_GLMM(), which requires using calcCounts(), which needs running generateClusters(). However,
> sce <- generateClusters(sce)
Error: Argument 'exprs' must be numeric matrix with colnames attribute set
> typeof(sce@assays@data@listData[["exprs"]])
[1] "double"
> dim(sce@assays@data@listData[["exprs"]])
[1] 14 15257202
> sce@assays@data@listData[["exprs"]] <- as.matrix(sce@assays@data@listData[["exprs"]])
> typeof(sce@assays@data@listData[["exprs"]])
[1] "double"
> dim(sce@assays@data@listData[["exprs"]])
[1] 14 15257202
> sce <- generateClusters(sce)
Error: Argument 'exprs' must be numeric matrix with colnames attribute set
Tried with normalize = T:
> da_res1 <- diffcyt(sce,
+ formula = da_formula1, contrast = contrast,
+ analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
+ normalize = T, norm_factors = "TMM",
+ clustering_to_use = "meta20", verbose = T
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DA tests using method 'diffcyt-DA-GLMM'...
There were 20 warnings (use warnings() to see them)
> topTable(da_res1, format_vals = T)
DataFrame with 20 rows and 3 columns
cluster_id p_val p_adj
<factor> <numeric> <numeric>
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 5 0 0
... ... ... ...
19 19 0.00e+00 0.00e+00
20 20 0.00e+00 0.00e+00
14 14 1.20e-13 1.33e-13
18 18 7.59e-09 7.99e-09
10 10 1.09e-02 1.09e-02
> warnings()
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, ... :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Identical warning message for 20 times.
Using the most minimalistic model didn't help either:
> da_formula1 <- createFormula(ei,
+ cols_fixed = c("condition"),
+ cols_random = "date"
+ )
> contrast <- createContrast(c(0, 1))
> da_res1 <- diffcyt(sce,
+ formula = da_formula1, contrast = contrast,
+ analysis_type = "DA", method_DA = "diffcyt-DA-GLMM",
+ normalize = T, norm_factors = "TMM",
+ clustering_to_use = "meta20", verbose = T
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta20' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DA tests using method 'diffcyt-DA-GLMM'...
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
> topTable(da_res1, format_vals = T)
DataFrame with 20 rows and 3 columns
cluster_id p_val p_adj
<factor> <numeric> <numeric>
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 5 0 0
... ... ... ...
17 17 0.00e+00 0.00e+00
19 19 0.00e+00 0.00e+00
20 20 0.00e+00 0.00e+00
18 18 5.02e-05 5.28e-05
10 10 7.44e-04 7.44e-04