Wy2160640/FastDemultiplexer

This is a read demultiplexer.


Pros:

* Can manage a lot of mismatches


== Motivation ==

The Illumina HiSeq 1000 dumps .cif (intensity) files, .bcl (base calls) files and 
.clocs (probably a summary of the intensities?)files,
among others.

Basically, for each cluster on the flow cell, there will be 4 sequences:
R1, R2, R3, R4. R2 and R3 are the indexes, or bar-codes, and R1 and R4 are the 
sequences.

There also may be only R1, R2, and R3 (RNA-Seq and ChIP-Seq)

CASAVA 1.8.2 only allows 1 mismatch per index (2 for dual-indexes).

Furthermore CASAVA 1.8.2 has been released many years ago.

This demultiplexer allows more data to be retrieved and is maintained too !

This demultiplexer is implemented in Python as a single executable.

This is GPL work.

== Input files ==

The files can be in fastq or in fastq.gz.

== Output files ==

fastq.gz

== Conversion from BCL/CLOCS to FASTQ ==

#!/bin/bash
#$ -N convert-HiSeq1000-BCL-2011-12-21.8
#$ -P nne-790-ab
#$ -l h_rt=48:00:00
#$ -pe default 8
#$ -cwd

sequenceWorld=/rap/nne-790-ab/Instruments/Illumina_HiSeq_1000_Hellbound
run=111207_SNL131_0065_AC0947ACXX
NSLOTS=8

source /rap/nne-790-ab/software/CASAVA_v1.8.2/module-load.sh

configureBclToFastq.pl \
--input-dir $sequenceWorld/$run/Data/Intensities/BaseCalls \
--output-dir  $sequenceWorld/$run/FastQ-Sequences/no-demul-Unaligned \
--use-bases-mask Y*,Y*,Y*,Y*

cd $sequenceWorld/$run/FastQ-Sequences/no-demul-Unaligned

make -j $NSLOTS



== Demultiplexing FASTQ files ==

By now, you should have one directory per lane.

[@colosse1 FastQ-Sequences]$ ls no-demul-Unaligned/Project_C0947ACXX/ -1
Sample_lane1
Sample_lane2
Sample_lane3
Sample_lane4
Sample_lane5
Sample_lane6
Sample_lane7
Sample_lane8

To demultiplex lane 7, run this:

	 FastDemultiplexer.py SampleSheet.csv 7 Project_C0947ACXX/Sample_lane7 Demultiplexed > stat.txt

This will generate 

Demultiplexed/Project_A/Sample_X
Demultiplexed/Project_A/Sample_Y
Demultiplexed/Project_A/Sample_Z
...

and

Demultiplexed/Undetermined_indices/Sample_lane7


== Author ==

Sébastien Boisvert
Wy2160640 / FastDemultiplexer

About

Languages