elhb / singleFatCellExomeAnalysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#singleFatCellExomeAnalysis

Intro

Scripts used for the analysis of single cells and recipient and donor whole blood samples in the study STUDYNAMEHERE

###The scripts There are 5 scripts in the analysisScripts folder, the five scripts are the following:

  1. startAnalysis
    This script generates bash scripts for all parts of the analysis (such as trimming the raw data, mapping trimmed reads, variant calling, QC etc) and submits these scripts to the SLURM workload manager.
  2. filterVariants
    This script is called in the last bash script generated by startAnalysis, it takes the final vcf files extracts any variants passing the defined filters and creates graphics that were used to create the figures in the paper. The filter values are hardcoded though should be easy to change in the script.
  3. removeEmptyReads.py
    This script is executed during the trimming procedure to remove any reads that have been trimmed to length zero or only have N's left in the sequence.
  4. wgaAdapterTrimmer.py
    This script is also executed during trimming, it uses a hamming distance function to identify reads starting with the WGA adapter and trims the first 30 bases of any such read.
  5. mappingStatsForExcel.py
    This script is called as soon as all the mapping and gvcf-file creation is completed, it identifies information such as mapping rates and on target percentages for all samples and makes a summary (not working properly after upgrading to GATK3 instead manual curation was used).

###Reproducing the analysis ####Note on Uppmax and Slurm All scripts were run at Uppmax, Uppsala University's resource of high-performance computers in Uppsala, Sweden (http://www.uppmax.uu.se/). The Uppmax resource uses the SLURM workload manager (http://slurm.schedmd.com/) and the scripts are therefore configure to use this system. it also loads binaries such as bowtie2, fastqc etc into the PATH through "module load" commands. If these type of systems is not present on you machine you should be able to run the automatically generated sbatch scripts using bash (possibly after some manual editing).

####Dependencies To run the analysis some additional software is requiered:

  1. TrimBWAstyle.pl (from http://wiki.bioinformatics.ucdavis.edu/index.php/TrimBWAstyle.pl)
  2. samtools (http://samtools.sourceforge.net/)
  3. picard tools (http://picard.sourceforge.net/)
  4. bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
  5. cutadapt (https://code.google.com/p/cutadapt/)
  6. fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
  7. GATK (https://www.broadinstitute.org/gatk/)

####Hardcoded variables Some paths and filenames etc are hardcoded, if you want to reproduce the analysis you need to change the paths in the script (or rename any files downloaded from the SRA to match the scripts as well as changing the locations of executables on your machine).

####Do not hesitate to contact me if you need assistance in running the scripts.

About


Languages

Language:Python 100.0%