FNaveed786/hlaforest

===TABLE OF CONTENTS===
1. Installation
    1.0 Prerequisites
    1.1 Set up the environment
        1.1.1 config.sh
        1.1.2 CallSimulation.sh
        1.1.3 CallHaplotypesPE.sh
    1.2 Testing
        1.1.1 Testing haplotype calling pipeline
        1.1.2 Testing simulation
2. Running
    2.1 Running HLAforest on RNA-seq data
    2.2 Running HLAforest simulations
3. Utilities
4. FAQs


=== 1. Installation ===
== 1.0 Prerequisites ==
HLAforest depends upon Bioperl, bowtie and (for simulations) Math::Random). Instructions for installing bioperl can be found on the web (http://www.bioperl.org/wiki/Installing_BioPerl). Precompiled bowtie binaries are available 
via their website (http://www.bioperl.org/wiki/Installing_BioPerl). Math::Random can be installed via CPAN. For instructions on installing modules from cpan, visit their official website (http://www.cpan.org/modules/INSTALL.html). If you do not intend to run simulations, it is not necessary to install Math::Random

Once you have installed all the prerequisites, you should add bowtie to your PATH. In bash you can do so with the following command:

%> export PATH=/path/to/bowtie/directory:$PATH

== 1.1 Set up the environment == 
In order to get HLAforest running, you will have to modify a few of the scripts to reflect your local environment. All these scripts can be found in the scripts directory.

= 1.1.1 config.sh = 
(Required)
You must modify the HLAFOREST_HOME variable to reflect the directory it resides on your local system.

(Optional)
You can modify NUM_THREADS variable to reflect the number of processors avaiable on your system.

= 1.1.2 CallSimulation.sh = 
(Required)
You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified

= 1.1.3 CallHaplotypesPE.sh = 
(Required)
You must modfiy the CONFIG_PATH variable to reflect the path of the config.sh file that you previously modified

= 1.1.4 Adding HLAforest to your PATH =
You can add the scripts directory to your PATH by issuing the following command (and by modifying to reflect your local installation). 

%> export PATH=/path/to/your/hlaforest/scripts:$PATH

== 1.2 Testing ==
You can test your installation of HLA forest by running the following two tests.

= 1.1.1 Testing haplotype calling pipeline = 
After you have set up your environment and modified the source files to reflect your local environment, you can test the installation by calling HLA haplotypes on a selected subset of RNA-seq data (gm12878).

From your HLAforest home directory, issue the following command:

%> CallHaplotypesPE.sh test2/ test2/gm12878_short_1.fastq test2/gm12878_short_2.fastq

After the script has completed, you should see the file test2/haplotypes.txt which contains your predicted haplotypes.

= 1.1.2 Testing simulation = 
Alternatively, you can test your installation by running a simulation. Simulations are run with the following syntax:

%> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE>

For example, you can generate a simulation with 100 2x100bp long reads with a 250 insert size and 0% substitution rate with the following command:

%> ~/hla/hlaforest/scripts/CallSimulation.sh sim_2x100_100reads_250insert_0subrate 100 100 250 0

Doing so will generate a new folder called sim_2x100_100reads_250insert_0subrate which contains the simulated haplotypes (sim_chosen_haplotypes.txt), predicted haplotypes (haplotypes.txt) and a file that compares the predictions against the true haplotypes (sim-score.txt)

=== 2. Running ===

== 2.1 Running HLAforest on RNA-seq data ==
You can run HLAforest by issuing the following command. 

%> CallHaplotypesPE.sh <output_directory> </path/to/read1.fastq> </path/to/read2.fastq>

You can input multiple fastq files as long as they are comma delimited. for example:

%> CallHaplotypesPE.sh out_dir read_R1_001.fastq,read_R1_002.fastq read_R2_001.fastq,read_R2_002.fastq

After the script has completed, you should see the file outdir/haplotypes.txt which contains your predicted haplotypes.


== 2.2 Running HLAforest simulations ==

%> ~/hla/hlaforest/scripts/CallSimulation.sh <OUTDIR> <READ_LENGTH> <NUM_READS> <INSERT_SIZE> <ERROR_RATE>

=== 3. Utilities ===

=== 4. FAQs ===
Q: Why is your program such a ram eater?
A: HLAforest is implemented in perl and stores a lot of meta data for each read tree. Each read tree contains the original alignment along with other deprecated fields. Future implementations will reduce the metadata associated with each tree, thus reducing the memory footprint. Second is perl itself. Once it grabs a hold of memory, it will never let it go until it runs out of ram.
FNaveed786 / hlaforest

About

Languages