natforsdick / Weta_GBS

Repo containing all scripts associated with Mahoenui giant wētā population genomic analysis from GBS data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wētā genotyping by sequencing analysis

Developed by Nat Forsdick, 2021. This project is led and funded by Manaaki Whenua - Landcare Research.

This repo contains scripts used to analyse single- and paired-end genotyping-by-sequencing (GBS) data from giant wētā species, including Deinacrida heteracantha, D. fallai, and D. mahoenui.

This work is associated with Forsdick et al., Population genomic analysis of Mahoenui giant wētā (Deinacrida mahoenui) reveals no reduction in genomic diversity following translocation, (in progress), focussing on D. mahoenui, using a reference genome from D. fallai.

Scripts were originally run on the NeSI platform via SLURM workload manager, except for R scripts which were run locally.

The workflow moves through demultiplexing, quality control, and mapping, before processing through Stacks ref_map and populations pipelines after which data are output in formats for analysis via genetics packages such as adegenet and SNPRelate in R, and STRUCTURE.

Software

Pipeline

  1. stacks_process_radtags.sl - Demultiplex raw paired-end GBS with Stacks process_radtags.
  2. run_trimgalore_B2.sl - Trim and adapter removal
  3. run_bowtie2_index.sl - Index reference genome
  4. 02_bowtie_B2.sl - Map individual data, collect mapping statistics
  5. 03_ref_map.sl - Run Stacks ref_map.pl
  6. 04_stacks_populations_B2.sl - Call and filter variants, allowing either 30% or 0% missing data, collect preliminary statistics, and output as VCF and PLINK
  7. 05_vcf2adegenet.sl - Convert VCF to PLINK format for conversion to other formats for downstream processing
  8. Analysis of final SNP sets in R
    • - Discriminant analysis of principal components and more with adegenet
    • - Principal component analysis, Fst, and more with SNPRelate
  9. 06_structure.sl - Analysis of final SNP sets with STRUCTURE
  10. Visualisation of combined STRUCTURE outputs in R with

About

Repo containing all scripts associated with Mahoenui giant wētā population genomic analysis from GBS data


Languages

Language:Shell 84.3%Language:Perl 15.7%