dekkerlab / MC-3C_scripts

Scripts used to process and analyse the data in the MC-3C paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

# Cwalk analysis pipeline README

Author: Filipe Tavares-Cadete

## Introduction

The pipeline to analyse the Dekker lab Cwalk data consists of several steps:
	1) Processing of raw PacBio data into fastq files;
	2) Processing of fastq files to separate into interaction fragments;
	3) Mapping of interaction fragments;
	4) Assembly of alignments into walks;
	5) Preparing data frames with detailed walk information;
	6) Preparing walk permutations;
	7) Scripts for plotting.

## Step requirements

All steps can be achieved on a Unix environment on a normal workstation, unless specifically noted.

### 1) Processing of raw PacBio data into fastq files;

This step requires the SMRT Analysis software by Pacific Biosystems running on a Unix environment.

## 2) Processing of fastq files to separate into interaction fragments;

This step uses the 'digest_roi.py' script and requires Python 2.7 with the Bio package installed.

## 3) Mapping of interaction fragments

This step requires bwa-mem version 0.7.12 and samtools version 1.3 installed. Exact parameters are found on 'launch_bwa_mem.sh'. For faster run-time, a machine with a large number of cores (32 or above) and large memory (32Gb or above) is recommended.

## Assembly of alignments into walks

This step is done with the 'reduce_frag_mappings.R' script, running R 3.5.0 or later, with the BioConductor GenomicRanges package installed.

## 5) Preparing data frames with detailed walk information

This step is done with the 'interactions_to_usable_frame_stricter.R' and 'interactions_to_usable_frame_keep_NAs.R' scripts. They require R 3.5.0 or later, with the GenomicRanges, rtracklayer, and tidyverse packages installed.

## 6) Preparing walk permutations

This step is done through the 'launch_permutations.sh' script. For faster results the use of a machine with 32 cores and 64Gb of RAM is recommended.

## 7) Scripts for plotting

Plotting was done in R, version 3.5.0 or later, with the tidyverse, cowplot and gridExtra packaged installed. 

About

Scripts used to process and analyse the data in the MC-3C paper


Languages

Language:R 91.3%Language:Shell 7.5%Language:Python 1.2%