cory-weller / atac-peak-merge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

atac-peak-merge

Requirements

List of all sample atac_peaks.bed files

The file all_peak_bed_files.txt should contain one file per line of all atac_peaks.bed files, which are generated by cellranger-arc. You could use find and append the output to all_peak_bed_files.txt or use any other method.

find /data/CARD_singlecell/Brain_atlas/NABEC_multiome/batch1/Multiome/ -name atac_peaks.bed > all_peak_bed_files.txt
find /data/CARD_singlecell/Brain_atlas/NABEC_multiome/batch2/Multiome/ -name atac_peaks.bed >> all_peak_bed_files.txt
find /data/CARD_singlecell/Brain_atlas/NABEC_multiome/batch3/Multiome/ -name atac_peaks.bed >> all_peak_bed_files.txt
find /data/CARD_singlecell/Brain_atlas/NABEC_multiome/batch4/Multiome/ -name atac_peaks.bed >> all_peak_bed_files.txt
find /data/CARD_singlecell/Brain_atlas/NABEC_multiome/batch5/Multiome/ -name atac_peaks.bed >> all_peak_bed_files.txt

Define contigs

all_contigs.txt should contain one contig per line. This can be extracted from any of the atac_peaks.bed files. For example, using the first file saved to all_peaks_bed_files.txt:

awk '!count[$1]++' < $(head -n 1 all_peak_bed_files.txt) | awk '/^[^#]/ {print $1}' > all_contigs.txt

Get merged peak list

This will serve as a global set of shared ATAC windows across all samples within all_peak_bed_files.txt. A separate tsv will be generated per contig.

module load R/4.3
Rscript merge_peaks.R

Reassign peaks

This will convert the original peak counts (per cell barcode) to match the merged peak set.

dat <- fread('chr1_peaks.tsv')
dat.long <- melt(dat, measure.vars=c('start','end'))

ggplot(dat.long[value < 2e6], aes(x=variable, y=value)) + geom_violin()
sbatch reassign_peaks.sh 11 205

peaks.R all_contigs.txt

reassign_peaks.sh

awk file

About

License:MIT License


Languages

Language:R 93.5%Language:Shell 6.5%