zroger49 / DeepType

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DeepType

Deep Learning Approach to Identifying Breast Cancer Subtypes Using High-Dimensional Genomic Data

Code Organization

This software contains these codes:

  • DeepType.py
  • data.py
  • eval.py
  • flags.py
  • model.py
  • training.py
  • utils.py

Requirements

  • python
  • tensorflow

Implement and Activate Tensorflow Environment under Conda

  • Implement: conda create -n tensorflow_env tensorflow

  • Activation: conda activate tensorflow_env

Use the software

1. Data format: filename.mat file

2. Variables:

Data: D*N numerical matrix. Each row is a gene, and each column is a sample. The genes should be ranked in the descending order by variances across samples.

targets: N*1 numerical vector. The ith element denotes the class that the ith sample belongs to.

3. Set parameters in flags.py:

NUM_GENES_1: the number of input genes.

NUM_CLUSTERS: the number of clusters K.

NUM_HIDDEN: the number of hidden layers.

NUM_NODES: numerical vector, the numbers of nodes in the hidden layers.

NUM_CLASSES: the number of unique classes of samples.

NUM_TRAIN_SIZE: the number of samples in the training set.

NUM_VALIDATION_SIZE: the number of samples in the validation set.

NUM_TEST_SIZE: the number of samples in the test set.

NUM_SAMPLE_SIZE: the number of samples in the whole dataset.

NUM_BATCH_SIZE: batch size.

NUM_LEARNING_RATE: learning rate.

NUM_SUPERVISED_BATCHES: the number of training steps in the supervised initialization.

NUM_TRAIN_BATCHES: the number of training steps in each epoch.

LAMBDA: sparsity penalty coefficient.

ALPHA: K-means loss coefficient.

DATA_DIR: Directory to put the training data.

RESULT_DIR: Directory to put the results.

4. Run the program

python DeepType.py

5. Data available

Due to the file size limit of Github, the breast cancer dataset is available at https://drive.google.com/file/d/1ao1zu3DS8GkYF-tHxpQ-1ev2psxXL-fx/view?usp=sharing

About


Languages

Language:Python 100.0%