A computational method for constructing Gene - Signature causal network from gene expression and exposure of mutational signatures.
pip install GeneSigNet
Installation source: GeneSigNet 0.1.0
Python libraries pandas, numpy, scipy, sklearn and warnings are required to be installed before running GeneSigNet, and pandas and pyvis are required to be installed before running the network visualization module VisualizeNetwork.
import pandas as pd
import GeneSigNet as GSN
# A gene expression data matrix (columns are genes and rows are samples)
ExpData = pd.read_csv('Gene_Expression_Data.csv', index_col=0)
# A signature exposure data matrix (columns are signatures and rows are samples)
SigData = pd.read_csv('Signature_Exposure_Data.csv', index_col=0)
maxit=10000 # maximum number of iterations for selecting sparce partial correlation
tolerance=1e-12 # tolerance of iteration for selecting sparce partial correlation
D=pd.concat([ExpData, SigData], axis=1) # Sample matched gene expressions and exposures of mutational signatures are combined as input
Net=GSN.WeightMatrix(D, maxit, tolerance)
Weight_Matrix=Net.ConstructNet()
Weight_Matrix.to_csv('Weight_Matrix.csv')
import pandas as pd
from pyvis.network import Network
Signatures=SigData.columns # or Signatures=['SBS1', 'SBS2',...,] (names of signature nodes)
th=0.05 # The threshold parameter for selecting edges to be included in the visualizing network
VisualizeNetwork(Weight_Matrix, Signatures, th)
Python Package
-
The GeneSigNet method is implemented in python and the codes are available as Python module and Jupyter Notebook module.
-
The python scripts for running GeneSigNet are available as Python script and Jupyter Notebook script. The scripts are recommended to load data and run GeneSigNet.
-
Simulated data matrices for gene expression and exposure of mutational signatures are provided with the source codes Simulated Data as example inputs to run the package.
-
Python module VisualizeNetwork visualize the subnetwork covering hub-nodes (Signatures) and their up and downstream nodes (causal and affected genes). The weight matrix inferred by GeneSigNet method is as input of the network visualization module.
The following files proves the results for the analysis on breast and lung cancer data
- Results: As result of the analysis in the two cancer data sets, the weight matrices (.cvs files) inferred to represent the directed interactions among genes and signatures, and subnetwork figures (.html files) representing the interaction between signatures and their up and down stream genes.
Cancer Data Data sets
The following files provides the gene expression and exposure of mutational signatures for cancer (BRCA and LUAD) patients.
- Gene_Expression_BRCA.csv: The normalized gene expression data (ICGC data portal) for 266 breast cancer (BRCA) patients. Gene expression profiles for 2,204 genes involved in either DNA metabolic or immune response processes of the Gene Ontology (GO) database were selected for the analysis.
- Signature_Exposure_BRCA.csv: Exposure of 6 mutational signatures and HRD mutational status in breast cancer are used for the analysis
- Gene_Expression_LUAD.csv: The normalized gene expression data (TCGA data portal) for 466 lung cancer (LUAD) patients. Gene expression profiles for 2,433 genes involved in either DNA metabolic or immune response processes of the Gene Ontology (GO) database were selected for the analysis.
- Signature_Exposure_LUAD.csv: Exposure of 6 mutational signatures known to be operative in lung cancer are used for the analysis.