lrsoenksen / CL_RNA_SynthBio

Code to reproduce Angenent-Mari, N. et al 2020. Deep Learning for RNA Synthetic Biology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CL_RNA_SynthBio

Code to reproduce Angenent-Mari, N. et al 2019. Deep Learning for RNA Synthetic Biology

Image description

DATA STRUCTURE (INPUT / OUTPU)

Data is loaded from a Toehold Sensor Database (data/2019-03-30_toehold_dataset_proc_with_params.csv) which is comma delimited table having the following columns of DNA encoded sub-sequences: organism, sequence_class, sequence_id, pre_seq promoter, trigger, loop1, switch, loop2, stem1, atg, stem2m linkerm post_linker, output

Input tensor is defined as (DS=Data_Style):

DS) Toehold Nucleotide Sequence
*NOTE: Base toehold string sequence [0-144]

  •   GGG  - Trigger - Loop1 - Switch  - Loop2 - Stem1 -  AUG  -  Stem2  -  Linker - Post-linker
    
  • [-3,-1]  [0,-29]  [30-49]  [50-79]  [80-90] [91,96] [97,99] [100,108] [109,134]  [135,144]    
    
  • For training we select our input sequence vector start with GGG and concatenate everything from "Loop1" to "post-linker"... which is seq_SwitchOFF_GFP  = ggg + seq[30:145].
    
  • Also, pre_seq & promoter sub-sequences are NEVER used because they are not converted into mRNA (is in the plasmid but > *     it is never in the functional toehold module), so it won't contribute in secondary structure at all. For this example > *     in particular we use DS_1.*
    

Output vector is defined as:

OUT) ON, Off &/OR ON-OFF State values derived from the experimental testing of toehold switch RNA sequence

PROBLEM DEFINITION

To investigate if a deep learning network can be used to predict toehold switch ON/OFF functionality, because in that case it would suggest the network is learning secondary structure prediction that would be transferable to other RNA based problems.

DATASET (Direct Download)

Due to ocassional high demand in downloads of this GIT, the LFS bandwidth or egress limit of this repo may require you download our data directly from this link: https://drive.google.com/file/d/1t_OXvtW-hEGRt3-mgNlyBKHBqro2Z572/view?usp=sharing

About

Code to reproduce Angenent-Mari, N. et al 2020. Deep Learning for RNA Synthetic Biology

License:MIT License


Languages

Language:Python 48.1%Language:Jupyter Notebook 43.5%Language:C 5.5%Language:C++ 0.9%Language:PostScript 0.8%Language:SWIG 0.3%Language:Perl 0.2%Language:M4 0.1%Language:TeX 0.1%Language:NASL 0.1%Language:Makefile 0.1%Language:TypeScript 0.1%Language:Roff 0.0%Language:Shell 0.0%Language:HTML 0.0%Language:XS 0.0%Language:Euphoria 0.0%Language:Fortran 0.0%Language:Pascal 0.0%Language:Module Management System 0.0%Language:Assembly 0.0%Language:Raku 0.0%Language:Emacs Lisp 0.0%Language:Pawn 0.0%Language:R 0.0%Language:PHP 0.0%