Hands on training with R
Introductive message
Welcome on this hands on workshop dedicated to R. I am Yedomon.
I am going to share some scripts regarding data analysis with R.
For reproducibility the datasets are here. A recording of this tutorial is available on youtube
Notice: Please visit the hyperlink (in blue) for installation and settings.
Pre-requisites:
A tutorial for package installation in RStudio is here.
We will use three packages in the present hands on session. Here is how to install it via command line:
-
package agricolae :
install.packages("agricolae", depencies = TRUE)
-
package ggplot2:
install.packages("ggplot2", depencies = TRUE)
-
package pachwork:
install.packages("patchwork", depencies = TRUE)
-
package ggsignif :
install.packages("ggsignif, dependencies = TRUE)
Context
At a part of a project relative to wilt disease of sesame caused by Fusarium oxysporum f.sp. sesami, a fungal sample collection was performed in four locations in Korea (See Figure 1).
Figure 1: Geographic positions of sample colllection sites.
Once, at laboratory, routine observation of the fungal was done(See Figure 2).
Figure 2: Symptome of sesame wilt disease (a, b and C), growth of the fungal in petri dish (d,e), microscopic view of micro and macro conidia (f, g, h)
A genetic characterization of using ITS markers was done via PCR (See Figure 3) in order to check if the isolate belong to Fusarium genus.
Figure 3: PCR result based on genus-based marker showing visible bands for all isolate exept from outgroups.
Then with the available ITS markers available in NCBI, we infer the phylogenetic position of the collected isolates (See Figure 4)
Figure 4: Phylogenetic tree showing a clear monophyletic branch for Fusarium oxysporum isolates. Maximum boostrap values are on each node.
Afterwards, the assessment of the pathogeneicity test of the isolates was scheduled.
The purpose here is to screen Korean cultivar and observe how they react to the fungal isolates.
Some data were recorded. Here we are going to perform an analysis of variance of each data following the tested cultivars.
Before we start, we should set our working directory
Check your working directory
getwd()
Set your own working directory
setwd("C:/Users/ange_/Downloads/r_tuto")
Analysis of variance and means comparison
Package loading
library(agricolae)
library(ggplot2)
library(ggsignif)
Data importation
data_y = read.csv("YS3386.csv",sep = "," , h = T)
Get an overview of the varaibles IDs
names(data_y)
ANOVA and LSD test of means comparison for plant height
model<-aov(Height~Type, data=data_y)
anova(model)
LSD.test(model, "Type", group=FALSE, p.adj= "bon",console=TRUE)
ANOVA and LSD test of means comparison for stem diameter
model<-aov(Diameter~Type, data=data_y)
anova(model)
LSD.test(model, "Type", group=FALSE, p.adj= "bon",console=TRUE)
ANOVA and LSD test of means comparison for leaf length
model<-aov(Leaf_length~Type, data=data_y)
anova(model)
LSD.test(model, "Type", group=FALSE, p.adj= "bon",console=TRUE)
ANOVA and LSD test of means comparison for leaf width
model<-aov(Leaf_wide~Type, data=data_y)
anova(model)
LSD.test(model, "Type", group=FALSE, p.adj= "bon",console=TRUE)
ANOVA and LSD test of means comparison for Chlorophyll content
model<-aov(Chlorophyll_content~Type, data=data_y)
anova(model)
LSD.test(model, "Type", group=FALSE, p.adj= "bon",console=TRUE)
Using the ANOVA and comparison results, we made data table in order to plot bar graphs
Drawing plot
Plant height
data_h = read.csv("height.csv",sep = "," , h = T)
h = ggplot(data_h, aes(x = Type, y = Mean, fill = Type)) +
geom_bar(stat = "identity", width = 0.8, show.legend = FALSE) +
labs(x = "Treatment", y = "Plant height (cm)")+
geom_errorbar(aes(ymin = Mean-sd, ymax = Mean + sd), width = 0.2)+
geom_signif(comparisons = list(c("Control", "Infected")), annotations="***", y_position = 10, tip_length = 0.03) +
theme_bw()+
theme(axis.text = element_text(colour = "black", face = "bold"),
axis.title = element_text(colour = "black", face = "bold") ) +
ggtitle('A')+
scale_fill_manual(values=c("#0E6251", "#FF5733"))
h
Diameter
data_d = read.csv("diameter.csv",sep = "," , h = T)
d = ggplot(data_d, aes(x = Type, y = Mean, fill = Type)) +
geom_bar(stat = "identity", width = 0.8, show.legend = FALSE) +
labs(x = "Treatment", y = "Stem diameter (mm)")+
geom_errorbar(aes(ymin = Mean-sd, ymax = Mean + sd), width = 0.2)+
geom_signif(comparisons = list(c("Control", "Infected")), annotations="***", y_position = 3.5, tip_length = 0.03) +
theme_bw()+
theme(axis.text = element_text(colour = "black", face = "bold"),
axis.title = element_text(colour = "black", face = "bold") ) +
ggtitle('B')+
scale_fill_manual(values=c("#0E6251", "#FF5733"))
d
Leaf length
data_ll = read.csv("leaf_length.csv",sep = "," , h = T)
ll = ggplot(data_ll, aes(x = Type, y = Mean, fill = Type)) +
geom_bar(stat = "identity", width = 0.8, show.legend = FALSE) +
labs(x = "Treatment", y = "Leaf length (cm)")+
geom_errorbar(aes(ymin = Mean-sd, ymax = Mean + sd), width = 0.2)+
geom_signif(comparisons = list(c("Control", "Infected")), annotations="***", y_position = 5.7, tip_length = 0.03) +
theme_bw()+
theme(axis.text = element_text(colour = "black", face = "bold"),
axis.title = element_text(colour = "black", face = "bold") ) +
ggtitle('C')+
scale_fill_manual(values=c("#0E6251", "#FF5733"))
ll
Leaf width
data_lw = read.csv("leaf_width.csv",sep = "," , h = T)
lw = ggplot(data_lw, aes(x = Type, y = Mean, fill = Type)) +
geom_bar(stat = "identity", width = 0.8, show.legend = FALSE) +
labs(x = "Treatment", y = "Leaf width (cm)")+
geom_errorbar(aes(ymin = Mean-sd, ymax = Mean + sd), width = 0.2)+
geom_signif(comparisons = list(c("Control", "Infected")), annotations="***", y_position = 4, tip_length = 0.03) +
theme_bw()+
theme(axis.text = element_text(colour = "black", face = "bold"),
axis.title = element_text(colour = "black", face = "bold") ) +
ggtitle('D')+
scale_fill_manual(values=c("#0E6251", "#FF5733"))
lw
chloro_content
data_cc = read.csv("chloro_content.csv",sep = "," , h = T)
cc = ggplot(data_cc, aes(x = Type, y = Mean, fill = Type)) +
geom_bar(stat = "identity", width = 0.8, show.legend = FALSE) +
labs(x = "Treatment", y = "Chlorophyll content")+
geom_errorbar(aes(ymin = Mean-sd, ymax = Mean + sd), width = 0.2)+
geom_signif(comparisons = list(c("Control", "Infected")), annotations="***", y_position = 35, tip_length = 0.03) +
theme_bw()+
theme(axis.text = element_text(colour = "black", face = "bold"),
axis.title = element_text(colour = "black", face = "bold") ) +
ggtitle('E') +
scale_fill_manual(values=c("#0E6251", "#FF5733"))
cc
combine all graphs in one
library(patchwork)
figure = (h | d | ll) /
(lw | cc | cc)
Save a high quality figure for publication
ggsave(figure, file = "bar_plot.png", limitsize = FALSE, width = 10, height = 9.5, type = "cairo-png", dpi=500)
The final output should be:
Have a question? Drop it here or write me an e-mail via yedomon@jbnu.ac.kr