"Localized coevolution between microbial predator and prey alters community-wide gene expression and ecosystem function"
Data and code here is provided under the MIT License. Feel free to use or remix as you see fit.
This is project contains data from the HAMBI species gene expression project. 30 HAMBI bacterial species were grown with either a low trait-diversity, ancestral Pseudomonas fluorescens SBW25 or a coevolved population of SBW25 that had been co-cultured with a Tetrahymena ciliate. These two treatments were conducted in the presence and absence of the coevolved ciliate. The experiment duration was 55 days. No fresh media was added, so the bacteria and ciliate were growing in a closed system without new nutrient inputs. RNAseq samples were collectd on days 4 and 45. 16S amplicon samples were collected on days 4, 41, and 45. Ciliate counts, bacterial CFUs, and community ATP concentrations were measured every ~ 4 days.
/R
contains R scripts/data
contains data that has been processed in some way for later use/data_raw
contains unprocessed data scraped from compute cluster/figs
contains figures generated from R scripts/sh
contains shell scripts. Mostly from running analysis on the puhti compute cluster/tables
contains summary tables generated from R scripts
Sequencing data is available from the NCBI Sequence Read Archive under Bioproject PRJNA818876 which can be viewed on the NCBI SRA run selector.
Install the NCBI SRA Toolkit. You can download a prebuilt binary from here.
Access SRA data following the instructions here.
You will need to setup your configurations, but afterwards you can basically do:
prefetch SRR18441242
fasterq-dump SRR18441242
You will need to do this for all the SRA accessions associated with BioProject: PRJNA818876. For example:
cut -f1 tables/SraRunTable.tsv | tail -n +2 | while read ID; do
prefetch $ID
fasterq-dump $ID
done
This sequencing data can be preprocessed and mapped using the scripts in the /sh/rnaseq
, /sh/amplicon
, and /sh/variant
directories.
Note this analysis borrows heavily from the excellent paper by BH Good and the code that he released publicly. Most of this analysis I translated from his python code; some I wrote myself.
Shovill.sh
- Assemble progenitor reads using shovillBwa.sh
- Map reads against Shovill assemblyBwaVsASM922.sh
- Map reads against NCBI SBW25 assembly ASM922OctopusIndividual.sh
- call variants on clonal samplesOctopusPolyclone.sh
- call variants on mixed population samplesSNPannCommands.sh
- kind of a hacky pipeline to convert Octopus output to tabular variant format. Uses snEff and snippy-vcf_to_tab from the Snippy software suite.
01_snp_functional_enrichment.R
-- Identify nucleotide sites and genes under parallel evolution, check for functional enrichment of mutated genes, plot Figure S102_plot_variant_freq.R
-- Produce Fig. 2 from the main text
submit_lgpr.sh
-- submit steps 2-5 below using this script
01_format_data.R
-- Formats data to be used in lgpr02_puhti_lgpr_ATP.R
-- run lgpr on cluster for ATP03_puhti_lgpr_ciliate.R
-- run lgpr on cluster for ciliates04_puhti_lgpr_cfus.R
-- run lgpr on cluster for bacteria colony forming units05_puhti_lgpr_opd.R
-- run lgpr on cluster for bacteria optical density06_lgpr_process_plot.R
-- Process output from lgpr and plot Figure 3
AmpliconQualityControl.sh
AmpliconMapping.sh
01_rpkm2tab.R
-- Prepare amplicon count tables. Count tables saved indata
02_make_phyloseq.R
-- Make phyloseq object03_make_figS3.R
-- Make Fig. S304_shannon_diversity.R
-- Runs DivNet estimate and plots DivNet vs Plugin. Also runs breakawy for testing difference in diversity between samples. Generates results for Table S305_ordination.R
-- Run ordination analysis and PERMANOVA. Generates results for Table S3- Run corncob beta binomial regression for differential abundance. Generates results for Table S3. Fitting corncob models with the bootstrap likelihood ratio test takes about an hour each. This was done separately for each model in the scripts below. What each of these scripts does is fit the same full model
~ days + pseudomonas_hist * predation
but with different null models to test the effect of leaving out these different terms by way of parametric bootstrap likelihood ratio tests. a.06_corncob_SBW25.R
b.07_corncob_evolution.R
c.08_corncob_predation.R
d.09_corncob_interaction.R
10_make_Fig3.R
-- compile results to make main text Fig. 4
process_metatranscriptomes.sh
- quality control RNAseq datamap_count_metaT.sh
- map reads against 30 HAMBI genomes using bbmap
01_read_filter_mrna.R
- read the feat counts and format02_rnaseq_stats.R
- calculate general stats (% noncoding RNA, etc...)03_species_rna_relative_abundance.R
- make supplementary Fig S3 showing proportion of bacterial species in the RNA dataset (excluding Tetrahymena).04_prep_deseq_data.R
- prepare all the files necessary to run the rlog transform and the deseq procedure05_rlog_transform.R
- perform the rlog transform necessary for distatis and to plot the ordination
- Implements the sum-taxon scaling + species amplicon abundance estimate normalization approach described in Zhang 2021.
- The idea to modify the internal normalization factors in DESeq2 is from Mike Love himself. There is additional information about the
normalizationFactors
approach in the DESeq2 vignette.. It is important that the normalization matrix has row-wise geometric means of 1, so that the mean of normalized counts for a gene is close to the mean of the unnormalized counts. This is accomplished by dividing out the current row geometric means in the normalization matrix. Modifying the normalization factor matrix replaces theestimateSizeFactors
step which occurs within the DESeq function. The DESeq function will look for pre-existing normalization factors and use these in the place of size factors (and a message will be printed confirming this). The sizeFactor estimation process is described in very simple terms here. - This post introduces the
estimateNormFactors
function from DESeq2 but it is not exported. See also here. My implementation is just based on this function. Some additional useful information here, here, and here - As a final comment... the field is not even clear whether these approaches are necessary? At least there are many 'high profile' papers that don't appear to consider taxon sum scaling and just DESeq2 the same as for single-organism RNAseq experiments. For examples where standard DESeq2 approach is applied to MTX, see here, here, here, and here
06_distatis.R
- perform distatis analysis, clustering, plots Fig. 5 from main text, and performs PERMANOVA07_deseq_tetrahymena.R
- perform simple DeSeq2 analysis and functional enrichment for Tetrahymena usingclusterProfiler
08_deseq.R
- perform DeSeq2 analysis. Using the same normalization approach from step 5.09_deseq_contrasts.R
- run contrasts between Evolved and Progenitor Pseudomonas. Requiresapeglm
package to shrink estimated log-fold changes near 0 counts.10_functional_enrichments.R
- perform functional enrichment analysis for the bacterial community. Produces Figure 6 and supplementary Figures S5 and S611_venndiagram.R
- Compare differentially expressed genes from different contrasts in step 9. Produces Figure S712_volcano_plot.R
- Plot Fig. S813_fraction_diff_expressed.R
- basic statistics about diff expressed genes.