icansleep22hours / Bis

Code and materials to support data analyses and reproduce results from the paper "Distribution-agnostic Deep Learning Enables Accurate Single‐Cell Data Recovery and Transcriptional Regulation Interpretation".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distribution-agnostic Deep Learning Enables Accurate Single‐Cell Data Recovery and Transcriptional Regulation Interpretation

License

This repository contains code, data, tables and plots to support data analyses and reproduce results from the paper Distribution-agnostic Deep Learning Enables Accurate Single‐Cell Data Imputation and Transcriptional Regulation Interpretation.

Abstract

Single-cell RNA sequencing (scRNA-seq) offers a robust methodology for investigating gene expression at the single-cell level. However, accurate quantification of genetic material is often hindered by the limited capture of intracellular mRNA, resulting in a large number of missing expression values, which impedes downstream analysis. Existing imputation methods rely heavily on stringent data assumptions, such as specific probability distributions, that restrict their broader application. Moreover, the recovery process lacks reliable supervision, leading to bias in gene expression signal imputation. To address these challenges, we developed a distribution-agnostic deep learning model, called Bis, for the accurate imputation of scRNA-seq data from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. After that, we propose a transcriptional expression consistency module that leverages bulk RNA-seq data as external priors to guide the imputation process and constrain the model to ensure consistency of the average gene expression between the aggregated imputed and bulk RNA-seq data. Experimental results validated that Bis outperforms other state-of-the-art models across numerous simulated datasets and diverse real scRNA-seq data generated by different representative single-cell sequencing platforms. Moreover, we showcase that Bis consistently achieved accuracy and efficacy in varied types of downstream analyses encompassing batch effect removal, clustering analysis, differential expression analysis, and trajectory inference. In addition, we demonstrated that in a tumor-matched peripheral blood dataset, Bis successfully restored the gene expression levels of rare cell subsets to unveil the developmental characteristics of cytokine-induced NK cells within a head and neck squamous cell carcinoma microenvironment.

Overview

System Requirements

Hardware requirements

Bis requires only a standard computer with enough RAM to support the in-memory operations.

Software requirements

OS Requirements

This package is supported for Linux. The package has been tested on the following systems:

  • Linux: Ubuntu 18.04

Python Dependencies

Bis mainly depends on the Python scientific stack.

numpy
scipy
PyTorch
PyTorch Lightning
scikit-learn
pandas
scanpy
anndata

For specific setting, please see requirement.

Installation Guide

$ git clone https://github.com/XuYuanchi/Bis.git
$ conda create -n bis python=3.9.15
$ conda activate bis
$ conda env create -f environment.yml

Detailed tutorials with example datasets

Bis is an optimal transport-based autoencoder model for single-cell imputation, which can be used for single-cell data imputation. The example can be seen in the train.py.

Detailed tutorials with each section of the paper can be seen in the folder analysis.

Data Availability

The data that support the findings of this study are openly available in Zenodo

License

This project is covered under the MIT License.

Citation

@article{su2024distribution,
  title={Distribution-Agnostic Deep Learning Enables Accurate Single-Cell Data Recovery and Transcriptional Regulation Interpretation},
  author={Su, Yanchi and Yu, Zhuohan and Yang, Yuning and Wong, Ka-Chun and Li, Xiangtao},
  journal={Advanced Science},
  pages={2307280},
  year={2024},
  publisher={Wiley Online Library}
}

About

Code and materials to support data analyses and reproduce results from the paper "Distribution-agnostic Deep Learning Enables Accurate Single‐Cell Data Recovery and Transcriptional Regulation Interpretation".

License:Apache License 2.0


Languages

Language:Jupyter Notebook 99.6%Language:R 0.2%Language:Python 0.2%Language:Shell 0.0%