bam_to_clusters

This program reads coordinate-sorted files with strand-specific single-end or paired-end RNA-Seq reads and prints clusters of mapped reads having a minimum depth and distance to other clusters.

The output is a tab-separated table with 5 columns:

chrName start stop strand height

Coordinates are 1-based.

The implementation is based on stranded-coverage.

Installation

The program depends on the htslib, which should be downloaded to ../htslib.

Example:

git clone https://github.com/samtools/htslib.git
git clone https://github.com/pmenzel/bam_to_clusters.git
cd bam_to_clusters
make

This will produce the executable file bam_to_clusters.

Usage

Example using the BAM file Aligned.bam, which needs to be sorted by coordinates:

bam_to_clusters -o output.tsv -s 1 -y 10 -l 20  Aligned.bam

Option -y denoted the minimum height (either in reads or in RPM when option n is used).

Option -l denotes the minimum distance between clusters.

MAPQ filtering

Reads can be filtered by their mapping quality (MAPQ) using option -m.

Multimapping reads

By default, each alignment belonging to a multimapping read is counted the same as alignments from uniquely mapped reads, i.e. as 1.

The option -f enables fractional counts. In this mode, the tag NH:i:N needs to be set to the number of alignments for a multimapping read, and each alignment will be counted as 1/N.

RPM Normalisation

The option -n enables normalisation of the coverage using the reads per million mapped reads (RPM).

License

See the file LICENSE.

pmenzel / bam-to-clusters