Grelot / aker--beetGenomeEnvironmentAssociation

Codes i wrote for the scientific paper "Predicting genotype environmental range from genome–environment associations"

Home Page:https://onlinelibrary.wiley.com/doi/full/10.1111/mec.14723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Codes for the paper : Predicting genotype environmental range from genome–environment associations

Stéphanie Manel, Marco Andrello, Karine Henry, Daphné Verdelet, Aude Darracq, Pierre‐Edouard Guerin, Bruno Desprez, Pierre Devaux

Molecular Ecology, may 2017


This repository contains all the scripts to calculate metrics (nucleotide diversity, Tajima's D...) on the beets genome from SNP data.

These metrics were necessary to validate our approach to predict the environmental range of species genotypes from the genetic markers significantly associated with those environmental variables in an independent set of individuals.

We applied this approach to predict aridity in a database constituted of 950 individuals of wild beets and 299 individuals of cultivated beets genotyped at 14,409 random single nucleotide polymorphisms (SNPs).

This study was funded by the French Government, under the management of the Research National Agency (ANR‐11‐BTBR‐0007) through the AKER programme in collaboration with Florimond Desprez company.


Prerequisites

Softwares

You must install the following softwares :

Data Files

The included data files are :

Code sources

scripts used to calculate statistics on the genome from SNP data

BASH scripts

Python scripts

  • get_col.py: Select columns of a CSV file according to a list of colunm's names.
  • convert_data2vcf.py : Creates VCF files for each chromosome of each SNP with genotype of each indivuals.
  • get_id_snp.py: get position|chromosome information in a table of SNP and find his ID in a VCF file.

R scripts

Workflow

Calculate metrics (nucleotide diversity, Tajima's D...) on outliers and non-outliers SNPs and analysis

1. Creates VCF files of all the SNP and selected individuals

bash vcf4PopGenome_protocole.sh

2. Get outliers SNPs and their positions

bash fabrique_outlier.sh

3. Generates figures and table of genome sliding 20Kbp-windows metrics

Rscript generate_fig_tab.R

4. Analyse statistics on tables

bash add_ID_to_tables.sh
Rscript analysis_tables.R

Results

Beets genome chromosome 1 sequence: SNPs-metrics and significantly associated with aridity SNPs positions

chr1 metrics

Every 5 Kbp nucleotide steps, we calculated following metrics on a 20Kbp windows onto the genome :

  • SNP density: number of SNPs
  • pi : nucleotide diversity
  • D: Tajima D
  • D*: Fu and Li D

Positions of genetic markers SNP significantly associated with aridity environmental variable are indicated by a black dot

About

Codes i wrote for the scientific paper "Predicting genotype environmental range from genome–environment associations"

https://onlinelibrary.wiley.com/doi/full/10.1111/mec.14723


Languages

Language:Python 47.6%Language:R 43.6%Language:Shell 7.3%Language:Rebol 1.5%