ratthachat/alexhh-protein-generation

Generating novel protein variants with variational autoencoders

This code provides implementations of variational autoencoder models designed to work with aligned and unaligned protein sequence data as described in the manuscript Generating novel protein variants with variational autoencoders.

Dependencies

The code requires Python 3. Variational autoencoder models were implemented in keras (2.1.2) using the tensorflow backend (tensorflow 1.0.0). Full python dependencies are listed in requirements.txt.

Individual models were trained on a single Tesla K80 GPU with cuda 8.0.0, cudnn v5 and Python 3.6.0.

Installation

To run code locally, first clone the repository, then install all dependencies (pip install -r requirements.txt)

Training models

To train models run the corresponding script (training logs will be written to output/logs, and weights saved to output/weights at the end of training.)

python scripts/train_msa.py

python scripts/train_raw.py

For the latter we recommend the use of a GPU, the former can run in a few hours on a standard CPU.

Generating sequences (demo)

To generate sequences by sampling from the prior run scripts/generate_from_prior.py, passing the name of the weights file, and specifying the --unaligned flag if using an ARVAE model. Generated sequences will be written to a new fasta file in output/generated_sequences/

python scripts/generate_from_prior.py data/weights/msavae.h5

ratthachat / alexhh-protein-generation

Generating novel protein variants with variational autoencoders

Dependencies

Installation

Training models

Generating sequences (demo)

About

Languages