phenotypic / ATACseq-Processor

Automatically process raw ATAC-seq files for analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ATACseq-Processor

This script is used to process raw ATAC-seq files ready for analysis. Processed files can be opened in the IGV Genome browser to visualise chromatin accessibility.

Prerequisites

Command Installation
fastqc, bowtie2, samtools, bioawk, bedtools Install via brew by running brew install fastqc bowtie2 samtools bioawk bedtools
macs3 Install by running pip3 install macs3
bedClip Install from the same directory by running curl http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/bedClip --output bedClip && chmod +x bedClip
bedGraphToBigWig Install from the same directory by running curl http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.x86_64/bedGraphToBigWig --output bedGraphToBigWig && chmod +x bedGraphToBigWig

Usage

Download with:

git clone https://github.com/phenotypic/ATACseq-Processor.git

Before running, you must ensure that the reference genome and executable files are located in the the same directory as the script, and that the two ATAC-seq files are located in the ATAC_paired subdirectory. See below:

ATACseq-Processor
│   ├── ATAC_paired
│   │   ├── 30fish-0hpa_S1_L001_R1_00_1.fastq.gz
│   │   └── 30fish-0hpa_S1_L001_R1_00_2.fastq.gz
│   ├── Danio_rerio.GRCz11.dna.toplevel.fa.gz
│   ├── bedClip
│   ├── bedGraphToBigWig
│   └── processor.sh

Start processor.sh from the ATACseq-Processor directory by running:

bash processor.sh

The first thing the script does is generate a quality control report for the two files in the ATAC_paired subdirectory and output it to the Quality_ATAC folder. You can view the reports in a web browser. Use this guide to interpret the results.

Once the script has finished running, open the SPECIES.clipped.sorted.bw file in the IGV Genome browser and load a reference genome to view chromatin accessibility:

igv_snapshot_ATAC

Notes

  • The pipeline used in this script is adapted from this excellent tutorial
  • The script should automatically detect the reference genome and ATAC-seq files, as long as they are located in the correct directories. The script will also automatically detect the species shorthand name and the number of CPU cores available
  • Building the genome index (step 2) is likely to take a long time as the process is computationally intensive
  • Once the script has run and you have saved the SPECIES.clipped.sorted.bw file, you are welcome to delete all of the other files generated as they are no longer needed

About

Automatically process raw ATAC-seq files for analysis

License:MIT License


Languages

Language:Shell 100.0%