NickJeff13 / Eelgrass_Poolseq

Scripts and data for Eelgrass Poolseq analyses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eelgrass Poolseq to Assess Population Structure and Gene Flow

Scripts for bioinformatics processing and analysis for Zostera marina poolseq project.

Contact: nick.jeffery@dfo-mpo.gc.ca

Citation: Jeffery NW, Vercaemer B, Stanley RRE, Kess T, Dufresne F, Noisette F, O'Connor M, Wong M. 2024. Variation in genomic vulnerability to climate change across temperate populations of eelgrass (Zostera marina). Published online at Evolutionary Applications

We used a pooled-sequencing approach for 23 eelgrass populations (geographic locations) and aligned reads generated by a NovaSeq platform to a publicly available Zostera marina genome assembly (Ma et al. 2021).
Raw sequence reads for each pool are deposited in the NCBI Sequence Read Archive at https://www.ncbi.nlm.nih.gov/sra/PRJNA891275.

The scripts here include trimming reads with fastp, alignment to the reference genome using bwa-mem2, removal of duplicate reads and indel realignment using GATK, and conversion of binary alignment files to mpileup format, and then sync format for Popoolation2.

Post-bioinformatics analyses conducted in BayPass (Gautier 2015), Treemix (Pickrell and Pritchard 2012), and the R package poolfstat (Gautier et al. 2021).

Sample sites

Example output from Pool 10 to look at coverage

#rname	startpos	endpos	numreads	covbases	coverage	meandepth	meanbaseq	meanmapq
Chr01	1	42612672	25740331	42520710	99.7842	87.0496	35.6	49.7
Chr02	1	40099171	21866617	39942582	99.6095	78.4374	35.6	40.2
Chr03	1	39935026	25496738	39351073	98.5377	91.5507	35.6	42
Chr04	1	34618966	22617769	34451221	99.5155	93.9099	35.6	41
Chr05	1	32617008	19391682	32516701	99.6925	85.7323	35.6	48.8
Chr06	1	29411071	17365610	29250779	99.455	85.1597	35.6	49.5

Running a PCA on ~500,000 SNPs across all 23 sites results in a division along PC1 between Atlantic and Pacific sites, while PC2 shows a latitudinal gradient in population structure among Atlantic and subarctic sites

Principal Components Analysis of Allele Frequencies

Acknowledgements

The authors wish to thank numerous collaborators that helped collect or provided eelgrass samples, including Tim Bernard, Isabelle Berube, Renee Bernier, Veronika Brzeski, Mike Coffin, Phil Colarusso, Chantal Coomber, Tessa Craig, France Dufresne, Coady Fitzpatrick, Robert Gregory, Javier Guijarro-Sabaniel, Cynthia Hays, Frederica Jacks, Kira Krumhansl, Andre Nadeau, John O’Brien, Shawn Roach, Stephanie-Robertson Kempton, Nathalie Simard, Sandrine Tousignant and Erica Watson, as well as Zeliang Wang and Dave Brickman for providing outputs from the BNAM oceanographic model. Kara Layton, Sarah Lehnert, Brenna Forester, and Thibaut Capblancq provided advice on conducting genomic offset analyses. We also thank the staff and technical personnel at Génome Québec for DNA extraction and sequencing.

About

Scripts and data for Eelgrass Poolseq analyses


Languages

Language:HTML 99.4%Language:R 0.5%Language:Shell 0.0%Language:Perl 0.0%