A large number of genes have null in their univariate Z output.

Question

A large number of genes have null in their univariate Z output.

hsun3163 opened this issue 2 years ago · comments

There 5000 genes whose sumstat table fail to be generated based on their rds. The error for at least two of them are the lack of univariate z score within their susie object.

This can be easily circumvented by computing the z score as the ratio between the posterior mean and posterior sd. However, it is unclear why the z score will be null in the first place and if the univariate Z is indeed posterior mean/posterior sd

hsun3163 · Answer 1 · Mon Oct 31 2022 23:40:10 GMT+0800 (China Standard Time)

The error is exemplified by

susie_list = readRDS("/mnt/vast/hpc/csg/molecular_phenotype_calling/eqtl/output/susie_per_gene_tad_2/cache/demo.ENSG00000000971.unisusie.fit.rds")
test_output = susieR::susie(susie_list[[1]]$input_data$X_resid,susie_list[[1]]$input_data$Y_resid,
                           L=10,
                           max_iter=1000,
                           estimate_residual_variance=TRUE,
                           estimate_prior_variance=TRUE,
                           refine=TRUE,compute_univariate_zscore = TRUE, coverage = 0.95 )
test_output$z

NULL

hsun3163 · Answer 2 · Mon Oct 31 2022 23:48:09 GMT+0800 (China Standard Time)

following codes are how susie generate the z score, however, it can be ran perfectly

null_weight = NULL
if (!is.null(null_weight) && null_weight != 0)
  X = X[,1:(ncol(X) - 1)]
z = susieR:::calc_z(X,susie_list[[1]]$input_data$Y_resid, center = TRUE, scale = TRUE)

hsun3163 · Answer 3 · Tue Nov 01 2022 02:29:07 GMT+0800 (China Standard Time)

After setting the refine = F, the z was successfully generated. It is unclear how the refine codes will impact the z though

hsun3163 · Answer 4 · Tue Nov 01 2022 02:40:24 GMT+0800 (China Standard Time)

The problem is

            m = list()
            for (cs in 1:length(s$sets$cs)) {
                pw_cs = pw_s
                pw_cs[s$sets$cs[[cs]]] = 0
                if (all(pw_cs == 0)) {
                  break
                }
                s2 = susie(X, y, L = L, scaled_prior_variance = scaled_prior_variance, 
                  residual_variance = residual_variance, prior_weights = pw_cs, 
                  s_init = NULL, null_weight = null_weight, standardize = standardize, 
                  intercept = intercept, estimate_residual_variance = estimate_residual_variance, 
                  estimate_prior_variance = estimate_prior_variance, 
                  estimate_prior_method = estimate_prior_method, 
                  check_null_threshold = check_null_threshold, 
                  prior_tol = prior_tol, coverage = coverage, 
                  residual_variance_upperbound = residual_variance_upperbound, 
                  min_abs_corr = min_abs_corr, compute_univariate_zscore = FALSE, 
                  na.rm = na.rm, max_iter = max_iter, tol = tol, 
                  verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = var(drop(y))/10000, 
                  refine = FALSE)
                sinit2 = s2[c("alpha", "mu", "mu2")]
                class(sinit2) = "susie"
                s3 = susie(X, y, L = L, scaled_prior_variance = scaled_prior_variance, 
                  residual_variance = residual_variance, prior_weights = pw_s, 
                  s_init = sinit2, null_weight = null_weight, 
                  standardize = standardize, intercept = intercept, 
                  estimate_residual_variance = estimate_residual_variance, 
                  estimate_prior_variance = estimate_prior_variance, 
                  estimate_prior_method = estimate_prior_method, 
                  check_null_threshold = check_null_threshold, 
                  prior_tol = prior_tol, coverage = coverage, 
                  residual_variance_upperbound = residual_variance_upperbound, 
                  min_abs_corr = min_abs_corr, compute_univariate_zscore = FALSE, 
                  na.rm = na.rm, max_iter = max_iter, tol = tol, 
                  verbose = FALSE, track_fit = FALSE, residual_variance_lowerbound = var(drop(y))/10000, 
                  refine = FALSE)
                m = c(m, list(s3))
            }

where s was overwritten with the results from m

                elbo = sapply(m, function(x) susie_get_objective(x))
                if ((max(elbo) - susie_get_objective(s)) <= 0) 
                  conti = FALSE
                else s = m[[which.max(elbo)]]

I think a minimalist approach is to change the compute_univariate_zscore = FALSE into compute_univariate_zscore = compute_univariate_zscore which allows the m to inherit whatever option we specify for the z. I wonder if u agree with the idea @gaow.

gaow · Answer 5 · Tue Nov 01 2022 03:00:42 GMT+0800 (China Standard Time)

@hsun3163 thanks i have one minute before my other meeting but I think i know what it is now. I'll DM you about it

hsun3163 · Answer 6 · Tue Nov 29 2022 01:40:07 GMT+0800 (China Standard Time)

New susie fix the problem.