LTLA / CRUKtools

Assorted scripts for running server jobs at CRUK CI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CRUK tools

This repository provides some tools for processing genomics data on the CRUK Cambridge Institute SLURM server.

Alignment

  • solo_align.sh provides a script for aligning a single library (single-end or paired-end).
  • multi_align.sh is a convenience wrapper to submit alignment jobs for many libraries in a data set.
  • guess_encoding.py guesses the Phred encoding for the aligner.

Alignment is performed using the subread aligner. It also requires samtools and MarkDuplicates.

Read counting

counter.R provides a template for read counting to produce a gene-by-sample count matrix. It requires specification of the BAM files for which to perform the counting as well as a set of GTF annotation files. It will use the featureCounts function in the Rsubread package.

Data mangling

  • cram2fastq.sh will convert a CRAM file into FASTQ for entry into the alignment pipelines above.
  • sanger_dump.sh will convert an entire folder of CRAM files into FASTQs.

Other

cell_ranger.sh will call the CellRanger pipeline to create a count matrix for single-cell transcriptomics data from the 10X Genomics platform.

About

Assorted scripts for running server jobs at CRUK CI


Languages

Language:Shell 81.2%Language:Python 11.5%Language:R 7.3%