JackCurragh / riboseq_data_processing

repo for automated processing of Ribo-Seq (and associated RNA-seq) data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ribo-Seq Data Processing

Introduction

[Describe here what this pipeline does]

Requirements

This pipeline can be run using each of the following container methods

Method Instructions
Singularity docs.syslabs.io
Docker docs.docker.com
Conda docs.conda.io

Setup

Singularity
sudo singularity build singularity/pipeline Singularity

Then as the profile singularity specifies container = 'singularity/pipeline' use the following to execute:

nextflow run main.nf -profile singularity
Docker
docker build . -t pipeline-image

Then as the profile docker specifies container = 'pipeline-image:latest' use the following to execute:

nextflow run main.nf -profile docker
Conda

Create a conda definition yaml file eg. here

nextflow run main.nf -profile conda

Usage

Call the pipeline directly

nextflow run main.nf

Run with all the frills

bash scripts/run-w-frills <params-file> <profile name from nextflow.config>

Example

bash scripts/run-w-frills example_parameters.yml standard

Data Processing For RiboSeq.org

Automated processing of Ribo-Seq (and associated RNA-Seq) data for GWIPS-Viz and TRIPS-Viz

About Riboseq.org

This is a set of resources for the analysis and visualisation of publically available ribosome profiling data produced and maintained by various members of LAPTI lab in the School of Biochemistry and Cell Biology at Univeristy College Cork. These resources are well documented in their respective publications

Requirements

Outline

  1. Produce Database Of All Available Ribosome Profiling Studies
  2. Gather Metadata
  3. Fetch Files and Infer Gaps in Metadata
  4. Run Pipeline
  5. Upload to GWIPS & TRIPS

1. Produce Database Of All Available Ribosome Profiling Studies

In recent years the rate at which ribosome profiling studies have been published has steadily increased. When the riboseq.org resources were initiatlly developed the number of available ribo-seq datasets was managable via manual inclusion. Here we put in place a method that records the details of relevant ribosome profiling data deposited in GEO

Initially manual searching of GEO and SRA were used along with ARGEOS. The outputs of each of these methods were colated to find the set of unique datasets.

2. Gather Metadata

GEO and SRA run tables contain valuable metadata that may be important for the processing and cateloging of the datasets. In this step we use python scripts to glean what we can from the information available

3. Fetch Files and Infer Gaps in Metadata

A common problem with reprocessing data for these resources is that the data is deposited in GEO and SRA with inconsistent metadata. In the stage of the process we carry out a number of steps to check for the relevant data in the provided metadata and where it is absent we infer it from the data itself. This relates to information such as cell type and treatment but also UMI position and adapter position/sequence.

4. Run pipeline

In this stage we use nextflow to process the fetched reads following the schema below

Deptiction of the data processing pipeline

5. Upload to GWIPS and TRIPS

This stage uses the metadata to upload the processed files to the web resources in an automated fashion

About

repo for automated processing of Ribo-Seq (and associated RNA-seq) data

License:MIT License


Languages

Language:HTML 99.8%Language:Python 0.2%Language:Nextflow 0.1%Language:Shell 0.0%Language:Singularity 0.0%Language:Dockerfile 0.0%