AliciaMstt / Berberis_phylogeo

scripts for pop. genetics, structure and phylogeo with the Berberis ddRAD data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

Contains scripts for paralogous loci filtering, output data from the populations program of Stacks, as well as R scripts used for analyses and plotting.

Scripts and custom functions

Directory 3Berberis_phylogeo/bincontains the scripts (numbered) and R functions (not numbered, called from within the scripts) used for data analysis and plotting.

1.PopSamples_PostCleaning.r: filters data to keep only those samples having more than 50% of the mean number of loci per sample, and only those loci present in at least 80% of the barcoded sample

2.PopSamples_Whitelists-StacksPopulations.script: produces whitelists and populations maps to run Stacks populations program including all loci (no paralogous filtering)

3.PopSamples_excluding_paralogs.r: uses Stacks populations summary stats output to identify potential paralog loci. Output arelist of all potential paralogous loci (./docs/lociP05) and potential paralogs within Berberis alpina (./docs/potentialparalogs).

4.StacksPopulations_AllLoci.script: creates a whitelist file of loci and populations maps the for subset of samples to analyze. Then runs the populations program of Stacks using the lists of putantively paralogous loci and any loci where p=0.5 as blacklists. Output is in `data.out/PopSamples_m3.

4.StacksPopulations_EQsampsize.script: creates a Poulation Map for a subset of samples of equal sampling size for B. alpina, Zamorano and B. moranensis and runs the populations program from Stacks. Output is in data.out/PopSamples_m3/IncludingParalogs/AllLoci/BerEQsz.

4.bsub.StacksPopulations.job: used to run the two previous scripts in UEA cluster ((Westmere Dual 6 core Intel X5650 2.66GHz processor systems of 12 cores with 48GB of RAM)

5.Berberisphylogeo_examaning_popsoutput.r:

A knirtr html file is provided for 3.PopSamples_excluding_paralogs and 5.Berberisphylogeo_examaning_popsoutput.

Input/output data

Input data

The directory data.in/PopSamples_m3 contain the coverage and SNP matrices (output from Stacks export_sql.pl) from where loci present in enough number of samples and samples with enough number of loci were filtered by bin/1.PopSamples_PostCleaning.r.

Output data

The directories within data.out/PopSamples_m3 contain the output from the populations program of Stacks according to the following subsets of loci. They were generated by the script `bin/4.StacksPopulations_AllLoci.script":

Excluding_P05: excluding all loci with at least one SNP where p=0.5 (corresponding to Putative orthologs in the manuscript)

ExcludingParalogs: keeping only presumably orthologous for B. alpina, ie excluding potential paralogs shared among B. alpina populations and other spp. (corresponding to Putative orthologs within B. alpina in the manuscript).

IncludingParalogs: all loci, including all potential paralogs

Within each directory the data is divided according to the following subsets of samples:

  • BerAll: all populations from Berberis alpina (including Za), Berberis moranensis (An population), Berberis trifolia (outgroup).
  • BerwoOut: all populations from Berberis alpina (including Za), Berberis moranensis (An population) but EXCLUDING outgroup (B. trifolia)
  • woZaOut: excluding samples from El Zamorano population (Za) and Berberis trifolia (outgroup)
  • BerSS: Berberis alpina sensu stricto (B. alpina ingroup in the ms) populations (Aj, Iz, Ma, Pe, Tl, To) ie Berall excluding Za, Out and An.

The dictory ./docs contains the list of all potential paralogous loci (./docs/lociP05) and potential paralogs within Berberis alpina (./docs/potentialparalogs).

Metadata

The file ´docs/Ber_06oct13.info/ contains sample popID, barcode and sequecing library data.

Figures

The file 3Berberis_phylogeo/bin/Figures_Berberis_paralogs.Rmdis a R markdown document detailing how figures from the main text and the supplementary materials were done.

About

scripts for pop. genetics, structure and phylogeo with the Berberis ddRAD data

License:GNU General Public License v3.0


Languages

Language:R 99.7%Language:Shell 0.3%