neurogenomics / CUT_n_TAG

Preprocessing pipeline for CUT&TAG data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUT_n_TAG

Pre-/post-processing pipelines and results for CUT&TAG data generated by the Neurogenomics Lab @ Imperial College London.

Results

Links to all results. Each subheader is the unique ID of a given sequencing batch, assigned by the Imperial BRC Genomics Facility.

EpiCompare

Native ChIP-seq (GSE66023) vs. ENCODE ChIP-seq (ENCSR000AKP-ENCFF038DDS)

CUT&Tag/CUT&Run/TIP-seq vs. ENCODE

Brian's reports
Sera's reports
  • Reference=ENCODE_H3K27ac: Comparison of CUT&Tag, CUT&Run and TIP-seq data generated by the Imperial Neurogenomics Lab vs. ENCODE.

HK5M2BBXY

Description: Initial test run of four samples (two H3k27ac + two H3k27ame3). Accidentally merged libraries across assay types (H3k27ac/H3k27ame3) during nf-core/atacseq run (will fix).

All samples

Downloading new samples

When BRC sends you an email letting you know they've finished sequencing your samples, follow these steps to download and prepare the data.

Note: File and folder names are just used as examples here. You'll need to adapt these to match your particular file/folder names.

  1. Log onto HPC.
  2. If you haven't done so already, set up your irods credentials (instructions here). You only need to do this once.
  3. Move into the folder where you want to store your data.
  4. Download the data with irods:
module load irods/4.2.0
iget -Pr /igfZone/home/di.hu/IGFQ001187_hu_10-5-2021_scCutandTag/fastq/2021-05-11/HL25VBBXY
cd HL25VBBXY
  1. Unpack each .tar file:
tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar 
tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar 
  1. Remove the old files (once you're sure the previous step worked):
rm IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar
rm IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar 
  1. Optional: Change permissions recursively so that other members of your team can access and manipulate the files. Make sure to adapt the scope of the permissions however is appropriate for your case.
chmod -R u=rwx,go=rx ../HL25VBBXY/

Pipelines

  • Platform: nf-core (nextflow + singularity/docker)

  • Discussion on adapting this pipeline for CUT&RUN data.

1. Setup containers on HPC

  • Docker isn't allowed on HPC by itself because it presents some security risk. Instead, follow these instructions to create a R-based Docker container (Rocker) inside a singularity container.

  • By default singularity bind mounts](https://singularity.lbl.gov/quickstart) /home/$USER, /tmp, and $PWD into your container at runtime.

mkdir -p /rds/general/user/$USER/ephemeral/tmp/  
mkdir -p /rds/general/user/bms20/ephemeral/rtmp/ 
  • On HPC, Rocker containers can be run through Singularity with a single command much like the native Docker commands, e.g. "singularity exec docker://rocker/tidyverse:latest R"

  • By default singularity bind mounts](https://singularity.lbl.gov/quickstart) /home/$USER, /tmp, and $PWD into your container at runtime.

  • !IMPORTANT! You may need to change the path of "/rds/general//user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/" to the actualy location of your R library.

  • Run Rocker within singularity

singularity exec -B /rds/general/user/$USER/ephemeral/tmp/:/tmp,/rds/general/user/$USER/ephemeral/tmp/:/var/tmp,/rds/general/user/$USER/ephemeral/rtmp/:/rds/general/user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/ --writable-tmpfs docker://rocker/tidyverse:latest R

2. Download nf-core/atacseq container

Now you can download the nfcore/atacseq singularity container via DockerHub

  • This will download "atacseq_latest.sif" to your home directory. singularity pull docker://nfcore/atacseq:latest

  • Copy this .sif file to the cacheDir specified in your nextflow config file.
    scp ~/atacseq_latest.sif /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/

  • Once you have the container downloaded, you can now specify it in the [-profile](download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file) flag in the main pipeline (see below).

  • More info on this process is on the lab Wiki.

3. Prepare nextflow config file

The config file tells nextflow how to run on Imperial's HPC.

  • module load nextflow
  • Copy the config file to the expected location so HPC knows how to run nextflow properly:
    scp hpc_config $HOME/.nextflow/config

4. Optional: Register Nexflow Tower

5. Download the singularity container

  • In theory, nf-core/atacseq should download the singularity automatically when it runs.
    However in practice, downloading it this way either takes waaayyy too long, and/or fails entirely.

  • Therefore, per Narun Fancy's recommendation "download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file". /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache

  • For more info on the -profile flag, see here.

6. Finally run the pipeline!

nextflow run nf-core/atacseq --input raw_data/HK5M2BBXY/design.csv --genome GRCh37 -r 1.2.1 -profile /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/atacseq_latest.sif

  • Platform: python
  • Platform: workflowr (R + CLI)
  • Platform: CLI-

Documentation

Exercepts from the full BRC Genome help page

File name

Illumina uses the following file name convention for the output fastq files

For example: samplename_S1_L001_R1_001.fastq.gz

  • samplename : Name of the sample provided in the samplesheet
  • S1 : Number of sample based on the sample order on the samplesheet
  • L001 : Lane number of the flowcell
  • R1 : The read. For e.g. R1 indicates Read 1 and R2 indicates Read 2 of a paired-end run
  • 001 : Its always 001
  • .fastq.gz : File extension. Its a gzipped fastq file

Please check the Illumina BCL2Fastq documentation for more information.

About

Preprocessing pipeline for CUT&TAG data.

License:MIT License


Languages

Language:HTML 85.2%Language:Roff 14.6%Language:JavaScript 0.2%Language:CSS 0.0%Language:R 0.0%Language:SCSS 0.0%Language:Jupyter Notebook 0.0%Language:Shell 0.0%