ryan2han / nmp-liver

Single cell analysis of donor livers pre- and post machine perfusion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Immune cell dynamics deconvoluted by single-cell RNA sequencing in normothermic machine perfusion of the liver

This repository contains a nextflow workflow to reproduce the single-cell analysis of

Hautz, Salcher, Fodor et al. (2022), Immune cell dynamics deconvoluted by single-cell RNA sequencing in normothermic machine perfusion of the liver. Nature Communications. doi:10.1038/s41467-023-37674-8

Raw sequencing data is availble from GEO (GSE216584). The preprocessed data and singularity containers required to run this workflow are available from zenodo. On zenodo, also the results (i.e. executed jupyter notebooks, plots, etc.) generated by this workflow can be downloaded.

Launching the workflows

1. Prerequisites

2. Obtain data

Before launching the workflow, you need to obtain input data and singularity containers from zenodo. First of all, clone this repository:

git clone https://github.com/icbi-lab/nmp-liver.git
cd nmp-liver

Then, within the repository, download the data archives and extract then to the corresponding directories:

 # singularity containers
wget "https://zenodo.org/record/7249006/files/containers.zip?download=1" 

# input data
wget "https://zenodo.org/record/7249006/files/input_data.zip?download=1" 

unzip containers.zip
unzip input_data.zip

3. Configure nextflow

Depending on your HPC/cloud setup you will need to adjust the nextflow profile in nextflow.config, to tell nextflow how to submit the jobs. Using a withName:... directive, special resources may be assigned to GPU-jobs. You can get an idea by checking out the icbi_liver profile - which we used to run the workflow on our on-premise cluster.

4. Launch the workflows

# Newer versions of nextflow break the code in this repo. Using `NXF_VER`, we can pin the version.
NXF_VER=22.04.5 nextflow run main.nf -resume -profile <YOUR_PROFILE> \
    --outdir "./data/results"

Structure of this repository

  • analyses: Place for e.g. jupyter/rmarkdown notebooks, gropued by their respective (sub-)workflows.
  • bin: executable scripts called by the workflow
  • conf: nextflow configuration files for all processes
  • containers: place for singularity image files. Not part of the git repo and gets created by the download command.
  • data: place for input data and results in different subfolders. Gets populated by the download commands and by running the workflows.
  • lib: custom libraries and helper functions
  • modules: nextflow DSL2.0 modules
  • tables: contains static content that should be under version control (e.g. manually created tables)

Workflow description

The analysis workflow comprises the followin steps:

  • QC of the unfiltered input data
  • Cell-type annotation
  • Pseudobulk generation and DE analysis with DESeq2
  • Subcluster analysis of Macrophages/Monocytes and Neutrophils
  • Comparison of timepoints T0 vs T1

Contact

For reproducibility issues or any other requests regarding single-cell data analysis, please use the issue tracker. For anything else, you can reach out to the corresponding author(s) as indicated in the manuscript.

Notes on reproducibility

We aimed at making this workflow reproducible by providing all input data, containerizing all software dependencies and integrating all analysis steps into a nextflow workflow. In theory, this allows to execute the workflow on any system that can run nextflow and singularity. Unfortunately, some single cell analysis algorithms (in particular scVI and UMAP) will yield slightly different results on different hardware, trading off computational reproducibility for a significantly faster runtime. In particular, results will differ when changing the number of cores, or when running on a CPU/GPU of a different architecture. See also scverse/scanpy#2014 for a discussion.

Since the cell-type annotation depends on clustering, and the clustering depends on the neighborhood graph, which again depends on the scVI embedding, running the workflow on a different machine will likely break the cell-type labels.

Below is the hardware we used to execute the workflow. Theoretically, any CPU/CPU of the same generation shoud produce identical results, but we did not have the chance to test this yet.

  • Compute node CPU: Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz (2x)
  • GPU node CPU: EPYC 7352 24-Core (2x)
  • GPU node GPU: Nvidia Quadro RTX 8000 GPU

About

Single cell analysis of donor livers pre- and post machine perfusion

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 83.8%Language:Nextflow 11.9%Language:R 4.0%Language:Shell 0.3%