startrekor / GS-SAP_GrainYield

Data, Scripts and Manuscript for effect of population structure in accuracy of GBLUP model in sorghum grain yield components

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation for Sapkota et. al. 2019

Data

BLUEs_pheno_all.csv - Best linear unbiased estimate (BLUE) phenotype data for all accessions in the panel.

PI Subpopulation Cluster Race Origin DTA PH GN GW GY FLH PL BL
PI152651 Caudatum 4 0 NA 66 146.61 1286 27.32 43.95 97 14.33333333 57.95
PI17548 Kafir 2 0 NA 66 214.06 1167 15.62 26.66 156.8333333 22.83333333 83.41666667
PI24969 Durra 3 0 NA 80 182.06 1319 29.92 50.86 162 13.33333333 41.83333333
PI329435 Mixed 1 0 NA 80 95.5 1388 15.68 30.47 65.83333333 26 72.58333333

Qmatrix_admixture_k5.csv

Ancestry coefficients of five subpopulations calculated using admixture (Alexandre et. al. 2009).

SAP_geno-pheno.zip

A zip file containing both genotype matrix (-1,0,1 format) and phenotypic BLUEs.

Manuscript

Manuscript for the publication along with supplementary file. The tables are embedded in the manuscript. The figures and equations (MathType) are included in a separate folder inside this folder.

Notebooks

This folder contains ipython notebooks used in computation for this study.

Scripts

Package versions used

lme4_1.1-21 MCMCglmm_2.29 ape_5.3 coda_0.19-2 Matrix_1.2-17 BGLR_1.0.8 rrBLUP_4.6

VCF_to_genotype_matrix.R

This script contains the functions my.read.vcf to read SNP files in vcf format, and the parse.vcf function to create a genotype matrix (in -1,0,1 format) that can be used in the rrBLUP package.

Stratified_Sampling_SAP_cluster.R

R script to create a cross-validation file with individuals proportionally divided from each cluster into five equal folds.

Cross-validation scripts

R scripts used to implement GBLUP and cross validations using kin.blup function in R package rrBLUP.

  • CV1_prediction.R # proportional sampling from races
  • CV2_AR_prediction.R # sampling from across race
  • CV2_WR_prediction.R # sampling from within race

MCMCglmm_CV1_vCov.R

Fits multi-response model in MCMCglmm to calculate variance-covariance components due to conditional expectations from race.

Heterozygosity_Pegas.R

R script to calculate heterozygosity per site using R package Pegas.

LD_Calculation_HillandWeir.R

Calculations for expected values of R^2 under drift equilibrium Hill and Weir (1988). As implemented in Remington et al. (2001).

GenomicHeritability.R

Calculates variance components and genomic heritability

About

Data, Scripts and Manuscript for effect of population structure in accuracy of GBLUP model in sorghum grain yield components


Languages

Language:Jupyter Notebook 98.3%Language:R 1.7%