ELEN4002/4012 Project by James Allingham and Paul Cresswell. Supervised by Prof. Scott Hazelhurst.
- Python 2.7 or higher
- NumPy
- TensorFlow
Optional (GPU support):
- NVIDIA CUDA 7.5
- cudNN 5.1
EpistasisNet can be run from the command line using either the python
or python3
commands.
There are a number of command line options which can be specified as shown bellow:
Flag | Default Value | Description |
---|---|---|
-file_in | Input data file location | |
-tt_ratio | 0.8 | test:train ratio |
-max_steps | 1000 | Maximum steps |
-train_batch_size | 100 | Training batch size |
-test_batch_size | 1000 | Testing batch size |
-log_dir | /tmp/logs/runx | Directory for storing data |
-learning_rate | 0.001 | Initial Learning rate |
-dropout | 0.5 | Keep probability for training dropout |
-model_dir | /tmp/tf_models/ | Directory for storing the saved models |
-write_binary | True | Write the processed numpy array to a binary file |
-read_binary | True | Read a binary file rather than a text file |
-save_model | True | Save the best model as the training progresses |
EpistasisNet expects input text files to be in the format provided by GAMETES. Note that the text files can be written to binary files by specifying the write_binary flag to be True.
The files for EpistasisNet are:
Directory | File | Description |
---|---|---|
data | convert_from_BEAM_format.py | Converts data in the format used by the BEAM tool to the GAMETES format |
data | convert_to_BEAM_format.py | Converts data in the GAMETES format to the BEAM format |
docs | style_guide.html | Google's Python style guide |
docs | MeetingMinutes/*.pdf | Minutes for various meetings held during the course of the projects |
src | GPU_off.sh | A shell script that turns off GPU usage for EpistasisNet (as well as other CUDA applications) |
src | GPU_on.sh | A shell script that turns on GPU usage for EpistasisNet (as well as other CUDA applications) |
src | convolutional_model.py | Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset |
src | data_batcher.py | Module that provides a single class: DataLoader, which manages reading of raw data and formatting is appropriately |
src | data_holder.py | Module that provides a single class: DataHolder, which manages reading of input files and storage of various data sets |
src | data_loader.py | Module that provides a single class: DataLoader, which manages reading of raw data and formatting appropriately |
src | linear_model.py | Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset |
src | model.py | Module that supplies a Model class which can be inherited from when creating models representing TensorFlow graphs |
src | nonlinear_model.py | Module that supplies a fully connected model with nonlinearities to test for epistasis on a GAMETES dataset |
src | pool_conv_model.py | Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset |
src | recurrent_model.py | Module that supplies a recurrent model with additional fully connected layers to test for epistasis on a GAMETES dataset |
src | run_model.py | Module that trains a TensorFlow model |
src | scaling_model | Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset - Best Model |
src | utilities.py | Module that provides a number of wrapper functions for TensorFlow |
tests | test_data_batcher.py | Module that provides test cases for the DataBatcher class |
tests | test_data_holder.py | Module that provides test cases for the DataHolder class |
tests | test_data_loader.py | Module that provides test cases for the DataLoader class |
tests | test_utilities.py | Module provides test cases for the utilities functions for building Tensorflow graphs |