PiotrTymoszuk / htGLMNET

High Throughput Light Weight Regularized Regression Modeling for Molecular Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

htGLMNET

High Throughput Light Weight Regularized Regression Modeling for Molecular Data

Summary

The package enables for fitting and basic evaluation of regularized regression models via the interface of the glmnet package 1. The pre-processing, modeling, evalaution, and prediction tools were fine-tuned specifically for use with genomic, transcriptomic, proteomic and pharmaco-molecular data.

Instalation

The package requires one non-CRAN package (microViz, https://github.com/PiotrTymoszuk/microViz), please make sure to install it first:

devtools::install_github('PiotrTymoszuk/microViz') ## dependency microViz

devtools::install_github('PiotrTymoszuk/htGLMNET')

Terms of use

The package is available under a GPL-3 license.

Contact

The package maintainer is Piotr Tymoszuk.

Acknowledgements

Many thanks to the authors of the packages glmnet, sva, tidyverse, rlang, furrr, ggrepel, and caret.

Special thanks to Paul Geeleher, Danielle Maeser, Robert Gruener and Stephanie Huang, the authors of the oncoPredict and pRRophetic packages, for the idea of using regularized linear regression for prediction of anti-cancer drug response 234.

Basic usage

Regularized linear modeling with RIDGE, LASSO and Elastic Net algorithms is a canonical machine learning approach widely used in molecular in silico studies which tackle inherently multi-dimensional data, e.g. expression levels of thousands of genes for few-to-few-hundred samples. Its usefulness comes from the general property of regularization which reduces over-fitting and, in case of so called L1-regularized LASSO and Elastic Net models, enables for selection of modeling features, e.g. association of the requested outcome with expression levels of few genes 156.

Our toolset is optimized for easy handling of high-dimensinal molecular biology numeric data sets in a classical machine learning workflow. The general idea of the package is to bundle three steps of regularized linear modeling in a comprehensive pipeline with basic quality control:

  1. pre-processing accomplished by pre_process() involves selection of common explanatory variables shared by the training and test data set, and subsequent removal of unwanted batch effects with the ComBat procedure 7

  2. model selection, training and evaluation accomplished by train() and involving selection of the optimal shrinkage parameter λ by cross-validation, training a regularized glmnet/cv.glmnet class model, and computation of basic performance metrics such as R^2, correlation coefficients, classification accuracy or Cohen's κ

  3. prediction accomplished by predict() allows for calculation of response value, classification probabilities, linear prediction scores, or class assignment for the training data set

Of note, the pre-processing step is not mandatory and the user can provide modeling data sets without pre-processing or following pre-processing done with other tools.

The package is designed to work with four families of generalized linear models: Gaussian and Poisson regression models, and binomial and multinomial classification models. Under the hood, is makes use of glmnet(), cv.glmnet(), predict.glmnet(), and coef.glmnet() from the glmnet package 1. The model selection/training/evaluation step can be easily accelerated by parallelization via the interface of furrr and future.

In the following example, we will predict drug response in individual tumor samples of pancreatic carcinoma given a training data set of drug sensitivity in cancer cell lines by Gaussian-family Elastic Net regression 6. R code used in this example is available here. Data sets with cell line gene expression and anti-cancer drug sensitivity originate from the GDSC2 drug screening project 8 and were downloaded from a repository by Maeser and collearues 4. Bulk-tumor expression data for pancreatic cancer samples originate from the CPTAC project 9.

Gene expression and drug sensitivity data

In the following example, we will use, apart from the htGLMNET package, also tidyverse (pipe and tabular data handling, plots), rstatix (normality check), future (parallelization), and pathchwork (plot panels). The training and test data sets are provided as numeric matrices gdsc_expression, gdsc_ic50, and paad_cptac within the package.

  library(tidyverse)
  library(htGLMNET)
  library(rstatix)

  library(future)

  library(patchwork)

Note, the package tools take training and test explanatory variable data in a form that is common in molecular biology, i.e. with features such as genes or proteins in rows and samples/observations in columns. By contrast, rows in the response matrix are always observations and columns represent the variables. In our case, the explanatory variable matrices contain log-transformed gene expression levels with official gene symbols in row names and sample identifiers in column names:

  ## training data: GDSC experiment:
  ## whole-genome expression in untreated cancer cell lines,
  ## log transformed to improve normality

  training_x <- gdsc_expression

  training_x[1:5, 1:5]
>   training_x[1:5, 1:5]
         COSMIC_908467 COSMIC_906798 COSMIC_753584 COSMIC_687588 COSMIC_749712
TSPAN6        7.868408      8.366444      7.105637      6.602253      7.435151
TNMD          2.635424      2.524400      2.798847      2.692571      2.832570
DPM1          9.296730     10.819526     10.486486      9.829172     10.344827
SCYL3         3.582473      3.952511      3.696835      3.698360      3.877500
C1orf112      3.997161      3.068264      3.726833      3.472732      3.555658
  ## test data: whole-tumor (bulk) gene expression in pancreatic cancers
  ## already log-transformed, to improve normality

  test_x <- paad_cptac
> test_x[1:5, 1:5]
         C3L-02899 C3N-02585 C3N-00303 C3N-03428 C3L-00599
A1BG      2.022513  1.981244  2.033752  1.995434  1.957314
A1BG-AS1  2.011082  1.997331  2.078784  2.076459  1.951348
A1CF      2.189583  2.234262  1.549379  2.072498  2.285331
A2M       2.788965  2.756788  2.783230  2.752751  2.755728
A2M-AS1   2.104827  2.187389  2.372447  2.319328  2.171888

For Gaussian linear modeling, both the explanatory factors and the responses are assumed to follow normal distributions. Our response variable, IC50, i.e. a concentration of a drug in µM at which half of tumor cells is dead, varies widely and is badly skewed. log() transformation helps to improve normality of the IC50 values. Of note, during development of the package, we have observed that fitting a Gaussian model for negative response value may fail. To avoid occurrence of negative drug response values upon log() transformation, we are changing the units of IC50 to pM (pico-mole).

  ## IC50 concentrations in µM of 10 anti-cancer drugs
  ## log transformation greatly improves normality!
  ## Yet, `glmnet` may have problems with negative values after
  ## the log transformation, to avoid that, we're transforming the IC50 to pM

  training_ic50 <- gdsc_ic50
>   training_ic50[1:5, 1:5]
              Camptothecin_1003 Vinblastine_1004 Cisplatin_1005 Cytarabine_1006 Docetaxel_1007
COSMIC_908467       0.010520653      0.011794665       8.105104       0.2087598    0.003440711
COSMIC_906798       0.005999987      0.018177077       1.855336      13.8102767    0.004296369
COSMIC_753584       0.053981926      0.062174085      11.444863      23.0481383    0.008578793
COSMIC_687588       0.042151880      0.074801151      16.417421       1.4409030    0.041145941
COSMIC_749712       0.099352813      0.005823254      18.511756      26.1957452    0.011557681
## poor normality of non-transformed IC50
## reflected by low Shapiro-wilk test statistic value

>   training_ic50 %>%
+     as.data.frame %>%
+     map_dfr(shapiro_test) %>%
+     mutate(variable = colnames(training_ic50))

# A tibble: 10 × 3
   variable          statistic  p.value
   <chr>                 <dbl>    <dbl>
 1 Camptothecin_1003     0.348 3.02e-26
 2 Vinblastine_1004      0.180 9.71e-29
 3 Cisplatin_1005        0.445 1.48e-24
 4 Cytarabine_1006       0.431 8.16e-25
 5 Docetaxel_1007        0.481 7.01e-24
 6 Gefitinib_1010        0.717 3.53e-18
 7 Navitoclax_1011       0.322 1.17e-26
 8 Vorinostat_1012       0.425 6.40e-25
 9 Nilotinib_1013        0.666 1.22e-19
10 Olaparib_1017         0.648 4.04e-20
## great improvement by log-transformation!!!

>   training_ic50 %>%
+     log %>%
+     as.data.frame %>%
+     map_dfr(shapiro_test) %>%
+     mutate(variable = colnames(training_ic50))

# A tibble: 10 × 3
   variable          statistic  p.value
   <chr>                 <dbl>    <dbl>
 1 Camptothecin_1003     0.967 1.18e- 4
 2 Vinblastine_1004      0.955 5.43e- 6
 3 Cisplatin_1005        0.991 2.53e- 1
 4 Cytarabine_1006       0.980 6.45e- 3
 5 Docetaxel_1007        0.982 1.15e- 2
 6 Gefitinib_1010        0.980 6.44e- 3
 7 Navitoclax_1011       0.991 2.41e- 1
 8 Vorinostat_1012       0.985 3.78e- 2
 9 Nilotinib_1013        0.900 2.35e-10
10 Olaparib_1017         0.989 1.31e- 1
  ## picomole conversion and log transformation

  training_ic50 <- log(1e6 * training_ic50)

Pre-processing

In the following portion of code, we are going to select modeling features, i.e. genes with substantial expression and variability, shared by both the training and test data set. To this end, we are removing 20% of genes with the lowest median expression and 20% of genes with the lowest variability measured by Gini index. Finally, we are adjusting the gene expression matrices with the selected modeling features for unwanted batch effects with ComBat. Those two tasks are performed with the pre_process() function of our package, but may be also done by other tools. Of convenience, pre_process() returns an object that bundles the adjusted training and test matrices and which can be passed directly to the modeling and prediction functions from our package.

  ## pre-processing involves the following steps:
  ##
  ## 1) selection of common genes in the GDSC cell lines and pancreatic cancers
  ## such genes must be fairly abundantly expressed and variable,
  ## hence using 0.2 quantiles of median expression and Gini coefficients
  ## as modeling variable selection criteria
  ##
  ## 2) batch adjustment via ComBat

  pre_pro_data <- pre_process(train = training_x,
                              test = test_x,
                              median_quantile = 0.2,
                              gini_quantile = 0.2)

  ## the effects of batch adjustment and common gene numbers
  ## can be inspected by `summary()` and `plot()`

We can easily retrieve numbers of selected modeling features, in our case genes, and their basic distribution statistics with the summary() method. The distribution and variability measures can be visualized with plot(), which returns a list of ggplot graphs - feel free to customize them! Selection cutoffs are displayed in the respective plots as dashed lines, whose color corresponds to the color coding of the training and test data sets.

> summary(pre_pro_data)
# A tibble: 2 × 11
  data_set n_features n_observations n_missing  mean    sd median   q25   q75  q025  q975
  <chr>         <int>          <int>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 train          7635            200         0  4.00  1.26   3.67  2.95  4.85  2.47  6.87
2 test           7635            140         0  4.00  1.22   3.80  3.13  4.77  2.09  6.66
pre_pro_plots <- plot(pre_pro_data)
  ## easy customization of the diagnostic plots

  pre_pro_plots$before_adjustment$median +
    labs(title = 'Median, before ComBat',
         x = 'Data set') +
    scale_y_continuous(limits = c(0, 15)) +
    pre_pro_plots$after_adjustment$median +
    labs(title = 'Median, after ComBat',
         x = 'Data set') +
    scale_y_continuous(limits = c(0, 15))

image

  pre_pro_plots$before_adjustment$gini_coef +
    labs(title = 'Gini index, before ComBat',
         x = 'Data set') +
    scale_y_continuous(limits = c(0, 1)) +
    pre_pro_plots$after_adjustment$gini_coef +
    labs(title = 'Gini index, after ComBat',
         x = 'Data set') +
    scale_y_continuous(limits = c(0, 1))

image

Model selection, training and evaluation

In this section we are going to select the optimal value of the shrinkage paramater λ by cross-validation of the training data set, construct a series of regularized models and evaluate them. Those tasks are accomplished by the function train(). In this case, train() is supplied with the output of pre_process() as x argument; the matrix of explanatory variables for the training data set is picked automatically. The user is also welcome to provide any numeric matrix as x which fulfills two criteria: (1) features in rows and observations in columns, and (2) row and column names defined. Critially, column names of x and rownames of y must overlap for matching the explanatory factors and the response. NA values are allowed in y and will be silently skipped. As we are fitting a series of Elastic Net models of drug sensitivity, we are setting the mixing parameter alpha to 0.5. Mean squared error seves as the loss function, i.e. model selection criterion. By declaring plan('multisession'), we are allowing for a parallel run:


  ## Gaussian family Elastic Net models are fitted
  ## (family = 'gaussian', alpha = 0.5)
  ##
  ## the optimal lambda is found by minimizing mean squared error
  ## in cross-validation (type.measure = 'mse')
  ##
  ## no standardization, because the gene expression values are expected
  ## to be on the same scale.
  ##
  ## The function runs in parallel upon defining a multisession backend

  plan('multisession')

  set.seed(12345)

  train_object <- train(x = pre_pro_data,
                        y = training_ic50,
                        standardize = FALSE,
                        family = 'gaussian',
                        alpha = 0.5,
                        type.measure = 'mse')

  plan('sequential')

Model evaluation statistics can be accessed by calling summary() and visualized by plot().

> summary(train_object)

# A tibble: 10 × 12
   response          n_observations n_features n_coefficients lambda loss_name loss_…¹ loss_…² varia…³ rsq_oof pearson spear…⁴
   <chr>                      <dbl>      <int>          <dbl>  <dbl> <chr>     <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Camptothecin_1003            200       7635             35  0.258 mse       min       2.24     3.56   0.370   0.850   0.849
 2 Vinblastine_1004             200       7635             44  0.304 mse       min       3.27     4.43   0.262   0.805   0.787
 3 Cisplatin_1005               200       7635             44  0.259 mse       min       2.26     4.03   0.440   0.845   0.854
 4 Cytarabine_1006              200       7635            110  0.110 mse       min       2.59     4.95   0.478   0.952   0.950
 5 Docetaxel_1007               200       7635             75  0.110 mse       min       1.60     2.01   0.203   0.900   0.873
 6 Gefitinib_1010               200       7635             49  0.123 mse       min       1.10     1.52   0.277   0.863   0.859
 7 Navitoclax_1011              200       7635             50  0.286 mse       min       3.46     4.73   0.269   0.824   0.779
 8 Vorinostat_1012              200       7635             25  0.168 mse       min       0.697    1.10   0.368   0.787   0.797
 9 Nilotinib_1013               200       7635             61  0.154 mse       min       1.77     2.92   0.395   0.879   0.862
10 Olaparib_1017                200       7635             48  0.143 mse       min       1.02     1.77   0.422   0.856   0.835
# … with abbreviated variable names ¹​loss_criterion, ²​loss_value, ³​variance, ⁴​spearman

As before, plot() generates a list of ggplot graphics, whose appearance can be easily adjusted by the user. In this particular case, we are adding dashed lines to one of the graphics, that represent cutoffs for selection of models with at least moderate performance in cross-validation ($R^2 \geq 0.26$) and training data (correlation of the predicted and observed IC50: $r \geq 0.5$):


train_plots <- plot(train_object)


train_plots$rsq_pearson +
    labs(title = 'Molde performance: CV and training') +
    geom_hline(yintercept = 0.5,
               linetype = 'dashed') +
    geom_vline(xintercept = 0.26,
               linetype = 'dashed')

image

Model coefficients: signatures of drug sensitivity

In this section, we are going to exploit the trained models to glimpse at genes associated with response towards particular anti-cancer compounds. Our modeling approach bases on the Elastic Net algorithm, which in fact uses just few-to-few-hundred explanatory variables of roughly 7000+ genes available for modeling. Such genes selected by the algorithm are expected to associate strongly with the response, in our case with drug sensitivity, and can be regarded as a kind of 'drug response signauture'. In the following code, we are going to retrieve such genes characterized by non-zero β coefficients and visualize their coefficient values for one popular anti cancer gene, cis-platin.

Model coefficients are computed by calling coef(), which returns a matrix of model coefficients for all genes used for modeling and the model intercept:


  ## gene signatures of drug response may be obtained by extracting
  ## non-zero coefficients of the regularized model.
  ## By this way, we're selecting only the features that contribute to
  ## prediction of drug sensitivity
  ##
  ## Coefficients of the models are obtained via `coef()`, which returns
  ## a matrix for all investigated genes.

  train_coefficients <- coef(train_object)


>  train_coefficients[1:5, 1:5]
            Camptothecin_1003 Vinblastine_1004 Cisplatin_1005 Cytarabine_1006 Docetaxel_1007
(Intercept)          16.04874          11.3986       18.14665        24.46561       10.59089
TSPAN6                0.00000           0.0000        0.00000         0.00000        0.00000
SCYL3                 0.00000           0.0000        0.00000         0.00000        0.00000
C1orf112              0.00000           0.0000        0.00000         0.00000        0.00000
FGR                   0.00000           0.0000        0.00000         0.00000        0.00000


## removal of the intercepts and selection of non-zero model coefficients

  non_zero <- train_coefficients[-1, ]

  non_zero <- colnames(non_zero) %>%
    map(~non_zero[, .x]) %>%
    map(as.data.frame) %>%
    map(set_names, 'beta')

  non_zero <- non_zero %>%
    map(filter, beta != 0) %>%
    map(rownames_to_column, 'gene_symbol') %>%
    map(arrange, -beta) %>%
    map(as_tibble) %>%
    set_names(colnames(train_coefficients))

For cis-platin, 43 non-zero model coefficients are:

> non_zero$Cisplatin_1005
# A tibble: 43 × 2
   gene_symbol   beta
   <chr>        <dbl>
 1 IL17RC      0.341 
 2 TOLLIP      0.208 
 3 HM13        0.192 
 4 PPME1       0.143 
 5 PTTG1IP     0.140 
 6 TM2D2       0.134 
 7 PPP2R2C     0.123 
 8 WTIP        0.118 
 9 UBTD1       0.0847
10 ABHD2       0.0714
# … with 33 more rows
# ℹ Use `print(n = ...)` to see more rows


  ## plotting coefficients of the signature members for cis-platin

  non_zero$Cisplatin_1005 %>%
    ggplot(aes(x = beta,
               y = reorder(gene_symbol, beta),
               fill = factor(sign(beta)))) +
    geom_bar(stat = 'identity') + 
    scale_fill_manual(values = c('-1' = 'steelblue', 
                                 '1' = 'indianred3')) + 
    theme_bw() +
    theme(axis.title.y = element_blank(),
          axis.text.y = element_text(face = 'italic'),
          legend.position = 'none') +
    labs(title = 'Cis-platin response signature',
         x = expression(beta * ', association with IC50'))

image

As evident from the graphic above, at least some genes of the cis-platin response signature make biological sense. For example, IL17RC and TOLLIP are involved in NK-κB signaling, which may orchestrate cell survival. PPME and PPP2R2C are implicated in pro-survival signaling, while RIOK, SLFN11, DOT1L, and CHECK2 are involved in DNA replication, repair, chromatic control, and protein synthesis.

Prediction of drug sensitivity in bulk-tumor samples

In the final part of this example, we are gong to predict IC50 concentrations for individual bulk-tumor samples of pancreatic carcinoma in the test data set. Such predictions, but also predictions of e.g. linear predictor scores, probabilities and class assignment are performed by calling the predict() method. In our example, we would like to to generate predictions at the same scale as the log-transformed IC50 values used for model training and are hence specifying type = 'response'. The numeric matrix of explanatory factors, i.e. the test data set, is provided as newdata and has to be in the feature (row) $\times$ observation (column) format. Alternatively, as presented below, the user is welcome to provide the output of pre_process() as newdata: the test data set will be picked automatically. Please note, that in the code below, we are transforming the predictions back to IC50 expressed as µM:


  ## IC50 samples for individuals cancers may be computed
  ## by calling `predict(type = 'response')`.
  ## Of convenience, as `newdata`, the result of `pre_process()` may be
  ## supplied. The function picks the test data set automatically!

  test_ic50 <- predict(train_object,
                       newdata = pre_pro_data,
                       type = 'response')

  ## transforming the predicted sensitivities back to µM

  test_ic50 <- exp(test_ic50) * 1e-6


> test_ic50[1:5, 1:5]

          Camptothecin_1003 Vinblastine_1004 Cisplatin_1005 Cytarabine_1006 Docetaxel_1007
C3L-02899        0.06114990       0.04003696      22.306809        2.977348    0.008361944
C3N-02585        0.10661910       0.10359832      49.587790        3.172271    0.015345744
C3N-00303        0.06506168       0.00824819       4.778458        1.064913    0.008228247
C3N-03428        0.01745086       0.01321737       2.731877        1.459961    0.005907104
C3L-00599        0.15932344       0.03192528      27.123428        1.126792    0.015345844

Basically, such drug sensitivity measures may be used directly in analyses such as clustering or comparison of cancer sample subsets. Here, we can visualize the predicted IC50 values for the compounds whose models performed with at least moderate reliability as a violin plot:

## let's visualize predictions for
  ## the models with the best performance
  ## (cross-validated R^2 >= 0.26, Pearson's r >= 0.5)

  best_models <- train_object %>%
    summary %>%
    filter(rsq_oof >= 0.26,
           pearson > 0.5) %>%
    .$response

  ## plotting data in long format

  test_plot_data <- test_ic50[, best_models] %>%
    as.data.frame %>%
    rownames_to_column('sample_id') %>%
    as_tibble

  test_plot_data <- test_plot_data %>%
    pivot_longer(cols = all_of(best_models),
                 names_to = 'compound',
                 values_to = 'ic50')


> test_plot_data

# A tibble: 1,260 × 3
   sample_id compound             ic50
   <chr>     <chr>               <dbl>
 1 C3L-02899 Camptothecin_1003  0.0611
 2 C3L-02899 Vinblastine_1004   0.0400
 3 C3L-02899 Cisplatin_1005    22.3   
 4 C3L-02899 Cytarabine_1006    2.98  
 5 C3L-02899 Gefitinib_1010    23.8   
 6 C3L-02899 Navitoclax_1011    7.73  
 7 C3L-02899 Vorinostat_1012    7.12  
 8 C3L-02899 Nilotinib_1013    34.2   
 9 C3L-02899 Olaparib_1017     49.4   
10 C3N-02585 Camptothecin_1003  0.107 
# … with 1,250 more rows
# ℹ Use `print(n = ...)` to see more rows


  test_plot_data %>%
    ggplot(aes(x = reorder(compound, ic50),
               y = ic50)) +
    geom_violin(fill = 'cornsilk',
                scale = 'width') +
    geom_point(shape = 16,
               size = 1,
               position = position_jitter(width = 0.05,
                                          height = 0),
               alpha = 0.25) +
    scale_y_continuous(trans = 'log10') +
    guides(x = guide_axis(angle = 45)) +
    theme_bw() +
    labs(title = 'Predicted IC50, PAAD CPTAC',
         subtitle = 'Bulk cancer samples',
         x = 'Compound',
         y = 'IC50, µM')

image

References

Footnotes

  1. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw (2010) 33:1–22. doi:10.18637/jss.v033.i01 2 3

  2. Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol (2014) 15: doi:10.1186/gb-2014-15-3-r47

  3. Geeleher P, Cox N, Stephanie Huang R. pRRophetic: An R Package for Prediction of Clinical Chemotherapeutic Response from Tumor Gene Expression Levels. PLoS One (2014) 9:e107468. doi:10.1371/JOURNAL.PONE.0107468

  4. Maeser D, Gruener RF, Huang RS. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform (2021) 22:1–7. doi:10.1093/BIB/BBAB260 2

  5. Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc Ser B (1996) 58:267–288. doi:10.1111/j.2517-6161.1996.tb02080.x

  6. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol (2005) 67:301–320. doi:10.1111/j.1467-9868.2005.00503.x 2

  7. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics (2012) 28:882. doi:10.1093/BIOINFORMATICS/BTS034

  8. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res (2013) 41:D955–D961. doi:10.1093/NAR/GKS1111

  9. Cao L, Huang C, Cui Zhou D, Hu Y, Lih TM, Savage SR, Krug K, Clark DJ, Schnaubelt M, Chen L, et al. Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell (2021) 184:5031-5052.e26. doi:10.1016/J.CELL.2021.08.023

About

High Throughput Light Weight Regularized Regression Modeling for Molecular Data

License:GNU General Public License v3.0


Languages

Language:R 100.0%