jimmylihui / GenBench

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GenBench: A Comprehensive Benchmark of genomic foundation models

Introduction

GenBench is a comprehensive benchmark for evaluating genomic foundation model, encompassing a broad spectrum of methods and diverse tasks, ranging from predicting gene location and function, identifying regulatory elements, and studying species evolution. GenBench offers a modular and extensible framework, excelling in user-friendliness, organization, and comprehensiveness. The codebase is organized into three abstracted layers, namely the core layer, algorithm layer, and user interface layer, arranged from the bottom to the top.

(back to top)

Overview

Code Structures
  • GenBench/configs contains configuration for benchmark evaluation.
  • GenBench/data contains datasets.
  • GenBench/notebook contains analysis and visualization notebooks.
  • GenBench/src contains source code for evaluation piplines.
  • GenBench/weight contains pretrained weights for benchmark evaluation.
  • GenBench/experiment contains scripts for experiment management.

Installation

This project has provided an environment setting file of conda, users can easily reproduce the environment by the following commands:

cd GenBench
conda env create -f environment.yml
conda activate OpenGenome
python setup.py develop

Getting Started

Here is an example of single GPU non-distributed training HyenaDNA on demo_human_or_worm dataset.

bash tools/prepare_data/download_mmnist.sh
python train.py -m train experiment=hg38/genomic_benchmark_mamba \
        dataset.dataset_name=demo_human_or_worm \
        wandb.id=demo_human_or_worm_hyenadna \
        train.pretrained_model_path=path/to/pretrained_model \
        trainer.devices=1

Repeat the experiment

Please see experiment.MD for the details of experiment management. and find scrips in 'experiment' directory

Overview of Model Zoo and Datasets

We support various Genomic foundation models. We are working on add new methods and collecting experiment results.

(back to top)

Visualization

We present visualization examples of HyenaDNA below. For more detailed information, please refer to the notebook.

  • for Drosophila enhancer activity prediction, visualization of predicted enhancers and ground truth enhancers are shown in notebook/drosophila_pearsonr.ipynb after running the experiment.

License

This project is released under the Apache 2.0 license. See LICENSE for more information.

Acknowledgement

The framework of GenBench is insipred by HyenaDNA

Contact

(back to top)

About

License:Apache License 2.0


Languages

Language:Python 72.4%Language:Jupyter Notebook 23.4%Language:Shell 4.2%Language:Dockerfile 0.0%