REEU Summer program

This repository is intedended to follow up with the summer REEU internship. Main objective of the internship is to have an idea how to process Genome-wide association studies (GWAS), using as models using the phenotypic and genotypic of data of poplar (Populus trichocarpa) tree populations for the pathogen Sphaerulina musiva.

To beging with these analyis, the student need to get familiarized with unix/linux commands, navigate the CQLS computer cluster, and version control and software development using Git.

Week 1.

Start using the terminal
1.1. Install a terminal prompt for windows
1.2. Create, move and delete files using the terminal
1.3. How to edit files using nano and vi, and other text edit software
1.4. Create a github
1.4.1. Create a repository
1.4.2. Git add, git commit, git push and git pull!
Accessing to the CQLS
2.1 Working in the CQLS cluster
2.2 Data structure
2.3 Accessing data

28 Jun:

Access to the CQLS
Create a directory to organize your github
Create a repository for the poplar/septoria GWAS on GitHub
3.1 Clone your repository to the CQLS cluster
3.2 Create the folder structure following the book "Guide to reproducible code"
3.3 Create the folders with mkdir and populate the directories witgh "mock" files and scripts (comment the files with your name)
Clone your gwas repository in your local git (on your ubuntu machine)

29 Jun

Install cutadapt
1.1 Read the manual
Find the adapter
Cut yor primers and filter by quality below 30 Phred score
Put together a bash script to process ALL your samples with one script

Hint: You can find the adapter in one of your outputs from fastQC

30 Jun

Work with bash commands
- sed
- cut
- awk
Practices loops for bash
Retrive from NCBI or any other database the genomes of:
- Populus trichocarpa
- Septoria musiva
  (Find the reference genomes information on the manuscripts)
Understand the formats and concepts behing a genome assembly
Summarize the genome data for P. trichocarpa and S. musiva

1 Jul

Summary of the week -

Week 2

5 Jul

Cluster environment and submissions
Running cutadatp in the CQLS cluster
Download genomes

6 Jul

Analyze results
Check BWA and GATK
2.1 BWA commands example \

7 Jul

Finalize working on mapping the read to the alignment against the reference Genome
Learn the GATK commands example
Index your Reference Sequence

Week 3

11 Jul

Wrap up last week
1.2. Edit your README.md in your poplar-septoria-GWA with all the steps we have done so far. From fastqc, cutadapt, bwa and gatk commands. \
Install picard.jar
after bwa mem
2.1 Search how to assign groups with picard.jar AddOrReplaceReadGroups
2.2 Search how to sort with picard.jar SortSam

12 Jul

Debug gatk VCF calling.
Execute the bash scripts for all septoria genome files

13 Jul
OFF

14 Jul

Summarize Referece Genomes for Septoria and Poplar

Hints: Genome size:
No of contigs:
N50:
No. of genes: \

15 Jul

Install GEMMA
Copy the VCF file and the phenotype data

$ /nfs1/BPP/LeBoldus_Lab/user_folders/Shared_projects/data_REEU/PopGWAS2016.vcf.tar.gz

Week 4

Jul 18

Test the GEMMA software 1.1 Use the example data and understand the file structure \
Generate the VCF

java -jar gatk.jar -T GenotypeGVCFs -R REF.fna -o file_jointcalls.vcf -V 1.vcf -V 2.vcf -V n.vcf

Jul 19

Submit an array in the CQLS cluster to generate VCFs for the second part of septoria reads
Parse the phenotype data by septoria isolate and poplar genotype

Jul 20

Finalyze the parsing of the phenotype data
1.1 Generate a .txt with the input in GEMMA structure.
Parse the phenotypes based on the All_septoria.vcf
2.1 Get the sample names of the septoria .vcf
2.2 Install vcfR \

library(vcfR)
poplar <- read.vcfR(poplar.vcf)
poplar.IDs <- colnames(poplar@gt)
write.csv(poplar.name, "poplar_names_vcf.csv")

Jul 21

Generate VCF with all individual .vcf files
Finalize the inputs for GEMMA input

Jul 22

Producce the .bim, .bed and .fam
Try to run GEMMA software

Week 5

** This week **

Debugging and polishing

Readings:

Tutorials: Linux: https://www.guru99.com/unix-linux-tutorial.html
How to install linux: https://www.guru99.com/install-linux.html
BASH Sheetcheat: https://devhints.io/bash
Reproducible code: https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf \

Manuscripts	Link
Poplar GWAS:	https://www.pnas.org/doi/10.1073/pnas.1804428115
Poplar VCF info	https://doi.ccs.ornl.gov/ui/doi/55
GWAS precedent	https://www.nature.com/articles/ng.3075#Sec10
GWAS methodology	https://www.nature.com/articles/ng.548
GEMMA Software	https://github.com/genetics-statistics/GEMMA
GWAS Robustness	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/
GWAS Doc man	https://vcru.wisc.edu/simonlab/bioinformatics/programs/fcgene/fcgene-1.0.7.pdf
BIMBAM	http://www.haplotype.org/software.html
--	--
Septoria PopGen:	https://apsjournals.apsnet.org/doi/10.1094/MPMI-05-19-0131-R

ricardoi / REEU_intern