laura-budurlean / PCA-Ethnicity-Determination-from-WGS-Data

A pipeline utilizing PCA on 1000 genomes and WGS data from your own samples to determine or validate ancestry of an individual.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PCA-Ethnicity-Determination-from-WGS-Data

A pipeline utilizing 1000 Genomes data and WGS data from your own samples to determine or validate ethnicity of an individual.

The goal of this pipeline is to determine ancestry of an individual using sequencing data (SNPs) starting with hg38 variant called files (VCF) from those individuals. The cohort data is then combined/overlayed with 1000 Genomes data and PCA analysis is performed. PCA scores are then plotted along with 1000 genomes data to provide a visual representation of where each individual falls on the overall PCA plot of ancestry.

Some requirements for this pipeline:

Instructions:

  1. Perform the steps outlined in the bash script 1-determine-ancestry-by-PCA
  2. In R, perform the steps outlined in 2-plot.R

The output of this ancestry calling pipeline will give you a plot with 1000 genomes super populations and your own samples overlayed on top of the super population they most closely resemble based on the SNV data.

example_PCA_for_github

About

A pipeline utilizing PCA on 1000 genomes and WGS data from your own samples to determine or validate ancestry of an individual.


Languages

Language:Shell 64.5%Language:R 35.5%