A large number of genes have null in their univariate Z output.
hsun3163 opened this issue · comments
There 5000 genes whose sumstat table fail to be generated based on their rds. The error for at least two of them are the lack of univariate z score within their susie object.
This can be easily circumvented by computing the z score as the ratio between the posterior mean and posterior sd. However, it is unclear why the z score will be null in the first place and if the univariate Z is indeed posterior mean/posterior sd
The error is exemplified by
susie_list = readRDS("/mnt/vast/hpc/csg/molecular_phenotype_calling/eqtl/output/susie_per_gene_tad_2/cache/demo.ENSG00000000971.unisusie.fit.rds")
test_output = susieR::susie(susie_list[[1]]$input_data$X_resid,susie_list[[1]]$input_data$Y_resid,
L=10,
max_iter=1000,
estimate_residual_variance=TRUE,
estimate_prior_variance=TRUE,
refine=TRUE,compute_univariate_zscore = TRUE, coverage = 0.95 )
test_output$z
NULL
following codes are how susie generate the z score, however, it can be ran perfectly
null_weight = NULL
if (!is.null(null_weight) && null_weight != 0)
X = X[,1:(ncol(X) - 1)]
z = susieR:::calc_z(X,susie_list[[1]]$input_data$Y_resid, center = TRUE, scale = TRUE)
After setting the refine = F, the z was successfully generated. It is unclear how the refine codes will impact the z though
The problem is
m = list()
for (cs in 1:length(s$sets$cs)) {
pw_cs = pw_s
pw_cs[s$sets$cs[[cs]]] = 0
if (all(pw_cs == 0)) {
break
}
s2 = susie(X, y, L = L, scaled_prior_variance = scaled_prior_variance,
residual_variance = residual_variance, prior_weights = pw_cs,
s_init = NULL, null_weight = null_weight, standardize = standardize,
intercept = intercept, estimate_residual_variance = estimate_residual_variance,
estimate_prior_variance = estimate_prior_variance,
estimate_prior_method = estimate_prior_method,
check_null_threshold = check_null_threshold,
prior_tol = prior_tol, coverage = coverage,
residual_variance_upperbound = residual_variance_upperbound,
min_abs_corr = min_abs_corr, compute_univariate_zscore = FALSE,
na.rm = na.rm, max_iter = max_iter, tol = tol,
verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = var(drop(y))/10000,
refine = FALSE)
sinit2 = s2[c("alpha", "mu", "mu2")]
class(sinit2) = "susie"
s3 = susie(X, y, L = L, scaled_prior_variance = scaled_prior_variance,
residual_variance = residual_variance, prior_weights = pw_s,
s_init = sinit2, null_weight = null_weight,
standardize = standardize, intercept = intercept,
estimate_residual_variance = estimate_residual_variance,
estimate_prior_variance = estimate_prior_variance,
estimate_prior_method = estimate_prior_method,
check_null_threshold = check_null_threshold,
prior_tol = prior_tol, coverage = coverage,
residual_variance_upperbound = residual_variance_upperbound,
min_abs_corr = min_abs_corr, compute_univariate_zscore = FALSE,
na.rm = na.rm, max_iter = max_iter, tol = tol,
verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = var(drop(y))/10000,
refine = FALSE)
m = c(m, list(s3))
}
where s was overwritten with the results from m
elbo = sapply(m, function(x) susie_get_objective(x))
if ((max(elbo) - susie_get_objective(s)) <= 0)
conti = FALSE
else s = m[[which.max(elbo)]]
I think a minimalist approach is to change the compute_univariate_zscore = FALSE
into compute_univariate_zscore = compute_univariate_zscore
which allows the m to inherit whatever option we specify for the z. I wonder if u agree with the idea @gaow.
@hsun3163 thanks i have one minute before my other meeting but I think i know what it is now. I'll DM you about it
New susie fix the problem.