DoctorYe / tcga-embedding

using shallow neural network layer (embedding) to infer gene-gene/sample relationship from gene expression data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Embedding (TCGA RNASeq)

Source code of applying embedding on TCGA RNASeqV2 RSEM normalized data.

Link

Web Interactive Embedding Projector (powered by TensorFlow)

Gene Embedding Matrix from:

Source Code

Handy python scripts to load data (load_data.py) and functions for handling embeddings (util.py) are included.

Dependencies

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • networkx
  • scipy
  • sklearn
  • fastai

Usage

  1. Clone the repo locally.
  2. Change directory to the local directory.
  3. Run python train.py --data $YOUR_INPUT_DATA --out-prefix $OUT --out-dir $OUTPUT_PATH.

Note.train.py can only be run on CUDA enabled machine. Input data must be .csv with oberservation per row and must have an ID column.

Folder Structure

tcga-embedding
|   LICENSE
|   README.rst
|   load_data.py
|   train.py
|   util.py
└───emb
    |   gemb_bias_CN.csv
    |   gemb_bias_normal.csv
    |   gemb_CN.csv
    |   gemb_normal.csv
    |   semb_bias_CN.csv
    |   semb_bias_normal.csv
    |   semb_CN.csv
    |   semb_normal.csv
    └───geneSCF
        |   gemb_d17_top_GO_BP.tsv
        |   gemb_d22_top_GO_BP.tsv
        |   gemb_d25_top_GO_BP.tsv
        |   gemb_d35_bottom_GO_BP.tsv
        |   gemb_d43_bottom_GO_BP.tsv
        |   gemb_d46_bottom_GO_BP.tsv
           
└───ipynb
    |   tcga_emb_dist.ipynb
    |   tcga_emb_pca.ipynb
    |   tcga_emb_subtyping.ipynb
    |   tcga_ioresponse.ipynb
    |   tcga_plot_emb_som_pca_heatmap.ipynb
    |   tcga_plot_gsea_compare.ipynb
    |   tcga_som.ipynb
    |   tcga_training_CN.ipynb
    |   tcga_training_normal.ipynb

└───ref
    |   genes_gids.tsv
    |   sid_ca.csv

About

using shallow neural network layer (embedding) to infer gene-gene/sample relationship from gene expression data

License:MIT License


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%