cjprybol / Neurospora_BSA_mutation_mapper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Neurospora bulk-segregant-analysis (BSA) mutation mapper


about

This project was created as an analysis pipeline to locate causitive suppressor mutations in the model organism Neurospora crassa.

setup project

To use this pipeline on your own data, or to recreate the findings out our study, start by cloning this repository into a directory called ESSENTIAL

mkdir {desired working directory}
cd {desired working directory}
git clone https://github.com/cprybol/DIM5_suppressor_mapping.git ESSENTIAL

required software (all must be in user path)

prepare environment

  1. add fastq files to the directory {path to working directory}/ESSENTIAL/FASTQ
    • follow naming conventions specified in {path to working directory}/ESSENTIAL/FASTQ/naming_conventions.txt
    • fastq files must be gzip-ed
  2. specify reference parent and divergent parent filenames, read-type (single- or paired-end), # of available cores, and min and max library fragment sizes in the {path to working directory}/ESSENTIAL/config.txt file

steps to run

  1. run {path to working directory}/ESSENTIAL/SCRIPTS/001.master.sh
  2. evaluate scatterplots in {path to working directory}/SNP_MAPPING/PARSED_SNP_INFO/GRAPHS and locate regions that satisfy desired similarity thresholds
  3. list your regions of interest in .bed format files in the {path to working directory}/ESSENTIAL/FILTER_SITES directory
    • follow guidelines listed in the {path to working directory}/ESSENTIAL/FILTER_SITES/readme.txt file
  4. run {path to working directory}/ESSENTIAL/SCRIPTS/002.master.sh
    • obtain output in {path to working directory}/GFF_OVERLAP and {path to working directory}/GFF_OVERLAP/TRANSLATE_CDS folders
      • {path to working directory}/GFF_OVERLAP/*.all lists all GFF features that high quality snps overlap
        • format: {full GFF entry}{full vcf entry for snp}
      • {path to working directory}/GFF_OVERLAP/*.snps_not_in_genes lists all snps that fall in euchromatin regions but do not overlap any GFF features
        • these may hit promotors or other factors outside the gene body, and may be of interest if a mutation is not found in the GFF features
      • {path to working directory}/GFF_OVERLAP/TRANSLATE_CDS/*.translated_CDS outputs the translated AA sequence for all snps falling in coding sequences, and shows if the snp produces a non-synonymous output
      • {path to working directory}/SV_DETECTION contains BreakDancer and Pindel output to detect possible structural variants

About

License:MIT License


Languages

Language:Shell 52.4%Language:Python 28.3%Language:R 19.3%