CNN-SVR is a deep learning-based method for CRISPR/Cas9 guide RNA (gRNA) on-target cleavage efficacy prediction. It is composed of two major components: a merged CNN as the front-end for extracting gRNA and epigenetic features as well as an SVR as the back-end for regression and predicting gRNA cleavage efficiency.
- Ubuntu 16.04
- Anaconda 3-5.2.0
- Python packages:
numpy 1.16.4
pandas 0.23.0
scikit-learn 0.19.1
scipy 1.1.0 - Keras 2.1.0
- Tensorflow and dependencies:
Tensorflow 1.4.0
CUDA 8.0 (for GPU use)
cuDNN 6.0 (for GPU use)
Ubuntu 16.04 download from https://www.ubuntu.com/download/desktop
Download Anaconda 3-5.2.0 tarball on https://www.anaconda.com/distribution/#download-section
pip install tensorflow-gpu==1.4.0 (for GPU use)
pip install tensorflow==1.4.0 (for CPU use)
Download CUDA tarball on https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
Download cuDNN tarball on https://developer.nvidia.com/cudnn
- ./data: the training and testing examples with gRNA sequence and corresponding epigenetic features and label indicating the on-target cleavage efficacy
- ./weights/weights.h5: the well-trained weights for our model
- ./cnnsvr.py: the python code, it can be ran to reproduce our results
Note:
- The input training and testing files should include gRNA sequence with length of 23 bp and four "A-N" symbolic corresponding epigenetic features seuqnces with length of 23 as well as label in each gRNA sequence.
- The train.csv, test.csv can be replaced or modified to include gRNA sequence and four epigenetic features of interest
- gRNA sequence: TGAGAAGTCTATGAGCTTCAAGG (23bp)
- ctcf: NNNNNNNNNNNNNNNNNNNNNNN (length=23)
- dnase: AAAAAAAAAAAAAAAAAAAAAAA (length=23)
- h3k4me3: NNNNNNNNNNNNNNNNNNNNNNN (length=23)
- rrbs: NNNNNNNNNNNNNNNNNNNNNNN (length=23)
./weights/weights.h5
python ./cnnsvr.py
0.22743436