beyondpie / CEMBA_wmb_snATAC

Whole mouse brain snATAC seq analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CEMBA_wmb_snATAC

This repository is used for the whole mouse brain (wmb) snATAC-seq data analysis of Center for Epigenomics of the Mouse Brain Atlas (CEMBA), which is now accepted by Nature 2023.

./repo_figures/GraphAbstract.jpg

Important Note

All the analysis and the h5ad data generated are from SnapATAC2 under <= 2.4.0 There are some break changes :https://kzhang.org/SnapATAC2/changelog.html later after SnapATAC2 >= 2.5.0

Data

Pipeline

  • We now have 234 samples and 2.3 million cells in total. So most of the analysis are depend on Snakefile to organize the pipeline and submit them to high-performance cluster (HPC) in order to use hundreds of CPUs at the same time.
  • R, Shell and Python (>= 3.10) are mainly used, especially R (>= 4.2).
  • Under the directory package, we put lots of common functions there.
  • We mainly use SnapATAC2 to analyze the single-nucleus ATAC-seq data
  • Comparation between Scrublet and AMULET: https://github.com/yuelaiwang/CEMBA_AMULET_Scrublet
  • The deep learning related codes now in the repo: https://github.com/yal054/mba_dl_model
  • sa2 is short for SnapATAC2 in this repo.

./repo_figures/snATAC-seq_analysis_pipeline.jpg

Codes

cembav2env.R: R env to store the metadata during analysis.

EnviormentDescription
cembav2envmeta data of SnapATAC and SnapATAC2
cluSumBySa2clustering meta data, such as resolution,
barcode to L4 Ids, L4 major regions and so on
Sa2IntegrationIntegration meta data, like Allen’s data descriptions
Sa2PeakCallingPeak calling meta data

Clustering

In total, we have implemented four-round iterative clustering. See details in 01.clustering

Integration and annotation

We use Allen’s scRNAseq data and their annotations for our data annotation. See details in 02.integration

Peak calling

We use macs2 with multiple stage filtering, especially use SPM >= 5 for filtering peaks. See details in 03.peakcalling

PBS TORQUE

In order to better support pbs torque in tscc, please

  • mkdir -p ~/.config/snakemake
  • then cp -r /projects/ps-renlab2/szu/projects/CEMBA2/package/pbs-torque-conda ~/.config/snakemake/
  • then go to the pbs-torque-conda directory you copy,
    1. modify config.yaml: replace szu to your name and corresponding conda path
    2. modify submit.yaml: replace szu to your name
    3. modify pbs-submit.py: replace szu to your name

About

Whole mouse brain snATAC seq analysis

License:MIT License


Languages

Language:R 59.3%Language:Python 35.6%Language:Makefile 3.2%Language:Shell 1.8%Language:Scala 0.1%