yuanlizhanshi / Islet_ML

The reproducible code for the Islet ML paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This the reproducible code for the islet paper.

Preprocess scRNAseq data

This folder contains the code and data used for the sRNA-seq analysis

Processed beta cell scRNA-seq dataset is stored in cowtransfer

After ML quality controlled, beta cell scRNA-seq dataset is stored in cowtransfer

*.R, r code for downstream analysis of scRNA-seq, including quality control,clustering, cell annotation,

Pre-trained human islets atlas

The well-annotated scRNA-seq datsets was trained by scVI and scANVI to learn cell representation.

The trained h5ad file is stored in cowtransfer

Preprocess scATAC-seq data

This folder contains the code and data used for the scATAC-seq analysis (multiome + scATAC-seq).

Processed beta cell scATAC-seq data is stored in cowtransfer

*.smk data, snakemake scripts for cellranger *.R, r code for downstream analysis of scATAC-seq, including quality control, clustering, cell annotation and peak calling

Preprocess ChIP-seq data and HiC data

For ChIP-seq data, we only run the basic upstream analysis, such quality control and mapping. The bam file of H3H27ac modification will used for ABC model input.

For HiC data, we could run the basic upstream analysis, such quality control, mapping (This workflow is reference from Renlab ). The hic file will used for ABC model input.

Infer and refine GRN form single cell multiomics data

First, split the scATAC bam file based on the cell type information. Then generate the gene expression (TPM) from multiome dataset from Wang et al. 2023

After prepare the all data, using code in run_ABC.Rmd to generate enhancer promoter interactions with ABC model

Then using the motifmachr to assign the TF to the corresponding cis-regulatory element

Using XGboost to remove donors with low predicted accuracy rates

Due to the heterogeneity of human data, we found some of the donors with the discrepancy gene expression profile, which exhibit the extremely low predicted accuracy rates (15%). We decided run iterative XGboost to remove these donors until no donor with low predicted accuracy rates, then calculate the differently expressed genes.

About

The reproducible code for the Islet ML paper


Languages

Language:Jupyter Notebook 97.2%Language:R 2.3%Language:Python 0.5%Language:Shell 0.0%Language:Perl 0.0%