Native system or Virtual Machine with at least around 30 GB free.
The systems is required to provide the following commands:
mkdir
cp
On NixOS, all dependencies are already included in the file shell.nix
, no manual installation of dependencies is required.
For Nvidia CUDA support, the file shell.nix
has to be edited, specifically #
in the line containing # cudaPackages.cudatoolkit
has to be removed.
For Debian / Ubuntu, the following two commands should suffice:
sudo apt install gcc curl python3.10 python3-pip
python3.10 -m pip install --user virtualenv
On other systems, dependencies may need to be installed manually if the previous sections are not applicable. Please refer to the previous install commands for Debian / Ubuntu and the shell.nix
file for NixOS to derive the required setup for your system.
On the first run, starting up might take a while, because the python dependencies required to run the Notebook will be installed.
To start JupyterLab on NixOS, the following command needs to be executed:
nix-shell
To start JupyterLab on Debian / Ubuntu and potentially other systems, the following command needs to be executed:
bash ./run.sh
Once JupyterLab has opened, open the notebook ZDNA-prediction.local.ipynb
in JupyterLab.
The big blue link at the very top with the title "Jump to Run Section" can be used to jump to the "Run"-Section, if you feel like it takes too long to scroll down.
The "Run"-Section contains further information on how to use the notebook.
This repository contains code and data for the article "Z-Flipon Variants reveal the many roles of Z-DNA and Z-RNA in health and disease"
The full genome predictions for human and mouse genomes can be downloaded here
To predict Z-DNA flipons on new data please use this colab notebook
The finetuned DNABERT weights can be downloaded from google drive:
1_HG_chipseq.ipynb - Generate data splits for HG data with Chipseq labels. Train the models. Generate full genome predictions.
1_HG_kousine.ipynb - Generate data splits for HG data with Kouzine labels. Train the models. Generate full genome predictions.
1_MM_curax.ipynb - Generate data splits for MM data with Curax labels. Train the models.
1_MM_kousine.ipynb - Generate data splits for MM data with Kouzine labels. Train the models.
2_Generate_stats_hg_chipseq.ipynb - Calculate most frequently attended k-mers for HG data with Chipseq labels.
2_Generate_stats_hg_kouzine.ipynb - Calculate most frequently attended k-mers for HG data with Kouzine labels.
2_Generate_stats_mm_curax.ipynb - Generate full genome predictions for MM data with Curax labels. Calculate most frequently attended k-mers.
2_Generate_stats_mm_kouzine.ipynb - Generate full genome predictions for MM data with Kouzine labels. Calculate most frequently attended k-mers.
README.md - This file
ZDNA-prediction.ipynb - Standalone notebook for prediction of Z-DNA. Intended to be run in colab enviroment via: https://colab.research.google.com/github/mitiau/Z-DNABERT/blob/main/ZDNA-prediction.ipynb