This repository contains data and scripts used to repoduce analyses in the manuscript "Metabolomic Selection for Enhanced Fruit Flavor" found on BioRxiv
"Consumers often regard heirloom fruit varieties grown in the garden as more flavorful than commercial varieties purchased at the grocery store. While plant breeders have historically focused on improving producer-orientated traits such as yield, consumer-oriented traits such as flavor have regularly been neglected. This is, in part, due to the difficulty associated with measuring the sensory perceptions of flavor. Here, we combine fruit chemical and consumer sensory panel information to train machine learning models that can predict how flavorful a fruit will be from its chemistry. By increasing the throughput of flavor evaluations, these models will help plant breeders to integrate flavor earlier in the breeding pipeline and aid in the design of varieties with exceptional flavor profiles."
Here we will go through the figures and which scripts were used to generate the underlying analysis. Often we generate the analysis in one script and design the figure component in another. We then combine the figure components together in inkscape.
To generate this figure, we start by preprocessing the data from the supplemental files with default choices for imputation and scaling:
- [0.preprocessing.R]
Next we create the metabolite network using the WGCNA package:
- [1.a.wgcna_tomato.R]
Then we plot the tomato volatile concentration violin plots in panel b:
- [1.b.metabolite_histograms.R]
Additionally, the cytoscape visualization used to plot out the results from 1.a.wgcna_tomato.R and 2.a.wgcna_blueberry.R. Also used to compute betweenness centrality statistics:
- [./results/fig1/asPublished_metabolite_networks.cys]
- [2.a.wgcna_blueberry.R]
- [2.b.metabolite_histograms.R]
The blueberry cytoscape visualizations are included in the cytoscape network file above.
Calculating contributions of volatile classes to variance in flavor ratings using linear mixed modeling:
- [3.a.variance_decomposition.R]
- [4.a.1.metabolomic_selection_tomato.R]
These models were ran on our HiPerGator cluster. The general structure is the first bash script launches the jobs for cross validation and replication, the second bash script creates an environment to run the jobs in R, and the R script does the computation.
-
[4.b.1.genomic_selection_tomato.sh]
-
[4.b.2.genomic_selection_tomato.sh]
-
[4.b.3.genomic_selection_tomato.R]
-
[4.b.4.metabolomic_selection_tomato.sh]
-
[4.b.5.metabolomic_selection_tomato.sh]
-
[4.b.6.metabolomic_selection_tomato.R]
-
[4.b.7.gblup_plots.R]
-
[4.c.1.subsampling.sh]
-
[4.c.2.subsampling.sh]
-
[4.c.3.subsampling.R]
-
[4.c.4.subsamplingPlots.R]
-
[5.a.1.calculate_final_weights.Rmd]
-
[5.a.2.plot_tomato_weights.R]
-
[5.a.3.plot_blueberry_weights.R]