zhenwoai / CNN-SVR

CNN-SVR: A deep learning approach for predicting CRISPR/Cas9 guide RNA on-target activity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CNN-SVR

Overview

CNN-SVR is a deep learning-based method for CRISPR/Cas9 guide RNA (gRNA) on-target cleavage efficacy prediction. It is composed of two major components: a merged CNN as the front-end for extracting gRNA and epigenetic features as well as an SVR as the back-end for regression and predicting gRNA cleavage efficiency.

Pre-requisite:

Installation guide

Operation system

Ubuntu 16.04 download from https://www.ubuntu.com/download/desktop

Python and packages

Download Anaconda 3-5.2.0 tarball on https://www.anaconda.com/distribution/#download-section

Tensorflow installation:

pip install tensorflow-gpu==1.4.0 (for GPU use)
pip install tensorflow==1.4.0 (for CPU use)

CUDA toolkit 8.0 (for GPU use)

Download CUDA tarball on https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run

cuDNN 6.1.10 (for GPU use)

Download cuDNN tarball on https://developer.nvidia.com/cudnn

Content

  • ./data: the training and testing examples with gRNA sequence and corresponding epigenetic features and label indicating the on-target cleavage efficacy
  • ./weights/weights.h5: the well-trained weights for our model
  • ./cnnsvr.py: the python code, it can be ran to reproduce our results

Usage

python cnnsvr.py

Note:

  • The input training and testing files should include gRNA sequence with length of 23 bp and four "A-N" symbolic corresponding epigenetic features seuqnces with length of 23 as well as label in each gRNA sequence.
  • The train.csv, test.csv can be replaced or modified to include gRNA sequence and four epigenetic features of interest

Demo instructions

Input (gRNA sequence and four epigenetic features):

  • Data format:

  • gRNA sequence: TGAGAAGTCTATGAGCTTCAAGG (23bp)
  • ctcf: NNNNNNNNNNNNNNNNNNNNNNN (length=23)
  • dnase: AAAAAAAAAAAAAAAAAAAAAAA (length=23)
  • h3k4me3: NNNNNNNNNNNNNNNNNNNNNNN (length=23)
  • rrbs: NNNNNNNNNNNNNNNNNNNNNNN (length=23)

Load weights (Pre-trained weight file):

./weights/weights.h5

Run script:

python ./cnnsvr.py

Output (Predicted activity score for gRNA):

0.22743436

About

CNN-SVR: A deep learning approach for predicting CRISPR/Cas9 guide RNA on-target activity


Languages

Language:Python 100.0%