Shaked35 / Genome-AC-GAN

This repository provides tools to train and evaluate the Genome-AC-GAN model for generating realistic artificial human genomes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Genome-AC-GAN: Enhancing Synthetic Genotype Generation through Auxiliary Classification

This Git repository contains an implementation of the article titled "Genome-AC-GAN: Enhancing Synthetic Genotype Generation through Auxiliary Classification" The repository provides tools and code to train the Genome-AC-GAN model with various configurations and conduct evaluations and comparisons with other models.

The Genome-AC-GAN is a model introduced in the article, which focuses on generating artificial human genomes using generative models. This repository aims to provide an implementation of the Genome-AC-GAN model, allowing users to train the model with different configurations and conduct various evaluations.

Preview models and some compression analysis functions based on the project: https://gitlab.inria.fr/ml_genetics/public/artificial_genomes/-/tree/master/GAN_AGs

This repository implements the following articles:

"Deep convolutional and conditional neural networks for large-scale genomic data generation"

"Creating Artificial Human Genomes Using Generative Model"

Installation

To use this repository, please follow these steps:

Clone the repository to your local machine: Copy code git clone https://github.com/Shaked35/cGenome-AC-GAN

create venv with python 3.9 and install requirements.txt you can run a setup or make an install from the terminal inside the project to prepare your python virtual environment.

Usage Once you have completed the installation process, you can start using the Genome-AC-GAN implementation. Here are the general steps to follow:

Prepare your dataset: the dataset should be ready in the resource directory. before each training, you'll do the preprocessing step.

Train the model: Use the provided training script train_with_configuration.py to train the Genome-AC-GAN model based on yaml configuration. you can use one of the existing configurations that are used in our paper. if you want to train your own model, create a new yaml configuration based on the arguments_description.yaml and run the same script with --path <your_configuration_path.yaml>

you can follow the steps below and train the new model or the old model. you also can find the input arguments that will affect the final model.

Perform evaluations: After training the model, you can conduct various evaluations to assess its performance. This may include evaluating metrics such as sequence similarity, diversity, or other domain-specific measurements.

you can find different evaluations below in different Jupiter notebooks.

Synthetic Genotypes Sequences (Output Models Results)

Comparisons and Evaluations

To further evaluate the performance of the Genome-AC-GAN model, this repository enables comparisons with other models from the article "Creating Artificial Human Genomes Using Generative Models" and the paper "Deep convolutional and conditional neural networks for large-scale genomic data generation." You can refer to the corresponding papers for more information on these models.

Contributing

Contributions to this repository are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

Genome-AC-GAN Architecture

GENOMEACGAN.png

PCA Compression

pca2_on_test_real.jpg

PCA Of Continental Population Training

superpopulation training.gifsuperpopulation-training.gif

Classifier Models Improvements With Synthetic Augmentations

classifier_with_synthetic_compare_by_model.jpg classifier_with_synthetic_by_pop.jpg

Training Models

In addition to the Genome-AC-GAN model, this repository also provides an implementation of the model described in the article "Creating Artificial Human Genomes Using Generative Models." You can find the details and instructions for training the old model in the artificial_genomes repository.

About

This repository provides tools to train and evaluate the Genome-AC-GAN model for generating realistic artificial human genomes.


Languages

Language:Jupyter Notebook 69.7%Language:Python 30.0%Language:Makefile 0.3%