4dn-dcic / repli-seq-pipeline

Scripts for the High-throughput Analysis of Replication Timing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

  • This repo is a 4DN-DCIC fork of the original repli-seq pipeline Shart, without docker.
  • Currently under development

what

This repository contains scripts in order to generate replication timing profiles from a set of raw reads from sequencing of either early- and late-replicating DNA, or from DNA extracted from cells sorted for S or G1 DNA content.

how

define your input files

# download example data
wget -cbre robots=off -np -nH --cut-dirs=3 -A 'g*' http://www.bio.fsu.edu/~dvera/share/repliseq/

execute workflow step by step

# clip adapters from reads
clip $fastq

# align reads to genome
index=bwaIndex_hg38/genome
 # paired-end
 align_pe $fastq1 $fastq2 $index
 # single-end
 align_se $fastq $index

# check stats
samstats $bam

# filter bams by alignment quality and sort by position
filtersort $bam

# check stats
samstats $sbam

# remove duplicate reads
dedup $sbam

# calculate RPKM bedGraphs for each set of alignments
count $rbam

# make a bed file for filtering based on sum of scores across all count bg files
make_filteredbed $bg1 $bg2 ...

# filter windows with a low average RPKM
filter $bg $filteredbed

# calculate log2 ratios between early and late
log2ratio $fbg

# quantile-normalize replication timing profiles to the example reference bedGraph
normalize $l2r

# loess-smooth profiles using a 300kb span size
smooth -l 300000 $l2rn

About

Scripts for the High-throughput Analysis of Replication Timing


Languages

Language:Shell 91.4%Language:R 5.9%Language:Perl 2.8%