hwanglab / Yonsei_gastric_cancer_32genes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview
This repository contains the scripts which can reproduce the main analysis results in our paper “Development and validation of a prognostic and predictive 32-gene signature for gastric cancer“, Nature Communications (2022) (https://doi.org/10.1038/s41467-022-28437-y).

System requirements
The codes require only a standard computer with enough RAM to support the in-memory operations.

OS requirements
The codes are tested on Windows 10 and Linux (Ubuntu 20.04)

Installation guide
All the scripts are tested on MATLAB (R2018-b, R2021a) and R (4.0.4). It might take few minutes to complete to install all the required packages.

The MATLAB scripts require few additional functions, nmf and LIBSVM packages. The additional function scripts (bestMap, hungarian and MatSurv: the author's information is included in each scirpt) and the nmf package which can be used without compilation have been included in “utils” directory. You need to install LIBSVM package in MATLAB. Please follow the instructions below. Installation of LIBSVM

  • Windows: pre-compiled mex files are included in "util" directory.
  • Linux: The scripts interact with LIBSVM in MATLAB interface. Please visit the Github page for LIBSVM (https://github.com/cjlin1/libsvm) and check out the installation instructions under “MATLAB/OCTAVE Interface” (After downloading the package, please go to "matlab" directory and run "make.m" in MATLAB. Please make sure that C/C++ compilers are properly installed in your MATLAB. Please add "matlab" directory as the path for the package in MATLAB after "make.m" runs successfully.)

The R script requires survminer, survival, ggplot2 and gdata packages. You can use install.packages("package name") to install the packages.

Demo
Please find each script file in "codes" directory and run it in MATLAB or R. The datasets which are required to run each script have been included in "data" directory.

For each script, the expected output and run time are as follows

script_consensus_clustering.m

  • Kaplan-meier (KM) curves for overall survival stratified by the consensus clustering described in the main text (please see Figure 2-B)
  • It might take few minutes due to the bootstrapping steps (where NMF runs 1000 times)

script_riskscore_prediction.m

  • Risk score prediction results for the ACRG + MD Anderson + TCGA combined dataset (the prediction results are used to generate Figure 2-C)
  • It might take less than a minutes

script_5Fu_comparison.R

  • Adjust KM curves for overall survival in the Yonsei cohort with the adjuvant chemotherapy information (please see Figure 3)

Please contact the authors (park.sunho@mayo.edu) for questions and comments about the scripts.

About


Languages

Language:MATLAB 94.0%Language:R 6.0%