Cross-tissue omics analysis discovers adipose genes encoding secreted proteins in obesity-related non-alcoholic fatty liver disease
This repo contains the code used for computational data analysis in Darci-Maher et al. 2023, published in eBioMedicine in June 2023.
The scripts used for each section of the analysis are inside the directory scripts
, each with their own sub-directory.
This README details the order in which to run each script, assuming that the user has access to a high-performance computing cluster and the data used in the project (not provided here).
Unfortunately, due to privacy laws, we are unable to provide the KOBS, METSIM, or UK Biobank cohort data on this repo. The GTEx, HPA, and WikiPathways data are freely available online as described in the manuscript data sharing statement.
The code used to generate each figure from the manuscript is available in the scripts listed here. These scripts are also integrated into the full analysis pipeline below.
This figure was drawn using Adobe Illustrator and does not include data.
cd scripts/
Rscript kobs_de/plot_kobs_de.R
This figure was drawn using Adobe Illustrator and does not include data.
This script produces the heatmap in Figure 4a. The diagram in Figure 4b was drawn using Adobe Illustrator.
Rscript kobs_de/runSBCcorrelation.R
Rscript sirna_knockdown_experiment_de/generate_knockdown_plots.R
Rscript hepg2_experiment_de/plot_hepg2_de.R
Rscript mendelian_randomization_ukb/plot_vegfb_coloc_and_MR_effectsizes.R
Rscript wgcna_crosstalk/correlate_modules_phenotypes.R
Rscript wgcna_crosstalk/arrange_sup_fig_heatmaps.R
Rscript wgcna_crosstalk/plot_pathway_enrichment_coolmodules.R
Rscript kobs_de/plot_kobs_de.R
Rscript mendelian_randomization_ukb/plot_vegfb_coloc_and_MR_effectsizes.R
cd ../
Follow the instructions below, in order, to reproduce all of the data analysis conducted in this project.
cd scripts/wgcna_crosstalk
Rscript get_adipose_liver_overlap_samples.R
Rscript prep_expression_data.R adipose
Rscript prep_expression_data.R liver
Rscript check_expr_genes_overlap.R
Rscript initial_wgcna_qc_checks.R adipose
Rscript initial_wgcna_qc_checks.R liver
Rscript construct_network.R adipose
Rscript construct_network.R liver
Rscript correlate_modules_phenotypes.R adipose
Rscript correlate_modules_phenotypes.R liver
Rscript correlate_modules_crosstissue.R
Rscript arrange_sup_fig_heatmaps.R
Plug module gene lists into WebGestalt with all expressed genes as background
Rscript explore_xtissue_modules.R
Rscript write_supplement_table_wgcna.R
cd ../../
cd scripts/kobs_liver_data_prep
./gen_liver_sampleIDs.sh
qsub runsamtools.sh
qsub runpicard.sh
Rscript merge_picard.R
cd ../../
After all of this, copy picardRNAmetrics_merged_kobs_liver_noMT.txt to scripts/kobs_de/data_liver
cd scripts/kobs_de
Read in the GENCODE v26 .gtf file, convert it to a dataframe, and filter for our DE genes
Rscript extractAnnotations.R
Consolidate covariate data from various sources and define groups to use for DE (based on liver histology measurements)
Rscript mergeCovs_defGrps.R
Use LIMMA to run several DE experiments for different group definitions
Run identical analyses for adipose and liver
Rscript runLimmaVoom.R adipose
Rscript runLimmaVoom.R liver
Note: when dowloading PANTHER results, added header manually and copy-pasted to each file
Rscript collectMetadata.R
Rscript test_de_ctm_enrich.R
Rscript plot_kobs_de.R
Rscript runSBCcorrelation.R
After generating the models, validate their significance with a permutation test
Rscript runBestSubsets.R
Rscript permuteBestSubsets.R
Rscript write_supplement_table_kobsde.R
Rscript write_supplement_table_bestsubsets.R
cd ../../
cd scripts/sirna_knockdown_experiment_de
python3 get_seq_key.py /u/project/pajukant/nikodm/sbc_sirna/data/fastq/ raw
qsub trim_reads.sh
python3 get_seq_key.py /u/project/pajukant/nikodm/sbc_sirna/data/fastq/ trimmed
QC script will generate a MultiQC HTML report that can be viewed in browser
qsub qc_rawfastq.sh
Using main assembly files ("CHR") for GENCODE release 19/GRCh37
./download_refs.sh
Pass 1 maps to existing transcripts
qsub run_genomeGenerate.pass1.sh
qsub map.py 1
Pass 2 adds new splice junctions present in the data to the reference and re-maps to that updated reference
./collectSJ.sh
qsub run_genomeGenerate.pass2.sh
Only output uniquely mapped reads
qsub map.py 2
qsub countMT.sh
Rscript visualizeMT.R
Can run these simultaneously
qsub coordinate_sort.sh
qsub readName_sort.sh
qsub qc_qort.sh
qsub run_picard.sample.sh
qsub run_featureCounts.sh
./multiqc_veryend.sh
./extract_uniquely_mapped.sh
Rscript expression_sanity_checks.R
Download adipogenesis marker genes from wikipathways: https://www.wikipathways.org/index.php/Pathway:WP236
Use biomaRt to convert Entrez IDs to ensembl IDs
Rscript convert_markergenes_to_ensembl.R
Rscript convert_srebf1genes_to_ensembl.R
Rscript run_limmavoom_pertimepoint.R
Rscript explore_de_pertimepoint_results.R
Rscript write_supplement_table_knockdownde.R
Rscript generate_knockdown_plots.R
Rscript plot_oro_absorbance.R
cd ../../
This pipeline is almost identical to the one in the sirna_knockdown_experiment_de folder, with a few small changes documented below.
cd scripts/hepg2_experiment_de
Rscript collect_qPCR_targets.R
python ../sirna_knockdown_experiment_de/get_seq_key.py /u/project/pajukant/nikodm/hepatocyte_rnaseq/data/fastq/ raw
../sirna_knockdown_experiment_de/trim_reads.sh
python ../sirna_knockdown_experiment_de/get_seq_key.py /u/project/pajukant/nikodm/hepatocyte_rnaseq/data/fastq/ trimmed
QC script will generate a MultiQC HTML report that can be viewed in browser
qsub ../sirna_knockdown_experiment_de/qc_rawfastq.sh
Pass 1 maps to existing transcripts
qsub ../sirna_knockdown_experiment_de/run_genomeGenerate.pass1.sh
qsub ../sirna_knockdown_experiment_de/map.py 1
Pass 2 adds new splice junctions present in the data to the reference and re-maps to that updated reference
/../sirna_knockdown_experiment_de/collectSJ.sh
qsub ../sirna_knockdown_experiment_de/run_genomeGenerate.pass2.sh
Only output uniquely mapped reads, and count MT reads
qsub ../sirna_knockdown_experiment_de/map.py 2
qsub ../sirna_knockdown_experiment_de/countMT.sh
Rscript visualizeMT_hepg2.R
Can run these simultaneously
qsub ../sirna_knockdown_experiment_de/coordinate_sort.sh
qsub ../sirna_knockdown_experiment_de/readName_sort.sh
qsub ../sirna_knockdown_experiment_de/run_picard.sample.sh
qsub ../sirna_knockdown_experiment_de/run_featureCounts.sh
../sirna_knockdown_experiment_de/multiqc_veryend.sh
../sirna_knockdown_experiment_de/extract_uniquely_mapped.sh
Rscript expression_sanity_checks_hepg2.R
Rscript collect_DE_targets.R
Rscript run_limmavoom_proteinamt.R
Rscript explore_de_results_hepg2.R
Rscript plot_hepg2_de.R
Rscript write_supplement_table_hepg2de.R
cd ../../
cd scripts/wgcna_crosstalk
Rscript correlate_sbcs_ligands.R
Rscript explore_ligand_corrs.R
Rscript get_ligand_cor_summarystats.R
Rscript explore_SBC_livermodule_corrs.R
Rscript write_supplement_table_sbc_livermodule_corr.R
Rscript plot_pathway_enrichment_coolmodules.R
cd ../../
Run Mendelian Randomization to test for a causal effect of cis-regulatory SNPs for adipose aware DE genes on NAFLD
To run this section:
- Download MAGENTA from the Broad institute website
- Move the MAGENTA folder inside
scripts
- Move run_magenta.sh into the MAGENTA folder with the MATLAB scripts
cd scripts/magenta
./prep_gwas_3cols.sh
Rscript format_magenta_geneset_entrez.R
Here, cd
into the MAGENTA folder (will have a unique name based on version)
qsub run_magenta.sh
Now, return to scripts/magenta
Rscript collect_magenta_results.R
cd ../../
cd scripts/mendelian_randomization_ukb
qsub generateGeneSpecificCisEQTLs.sh f
Rscript select_iv_snp_set.R f
qsub generate_iv_region_fpkm.sh f
qsub runPLINKldmatrix.sh f
qsub run_cond_coloc_iv_regions.sh f
Rscript generate_iv_nafldhit_snplist.R
./calculate_ld_ivs_nafldvars.sh
Rscript ld_prune_ivs.R
./extract_ivs_nafldgwas.sh
Rscript prep_mr_inputs.R
Rscript run_mrpresso.R
Rscript run_MendelianRandomization.R
Rscript plot_vegfb_coloc_and_MR_effectsizes.R
qsub generateGeneSpecificCisEQTLs.sh b
Rscript select_iv_snp_set.R b
qsub generate_iv_region_fpkm.sh b
qsub runPLINKldmatrix.sh b
qsub run_cond_coloc_iv_regions.sh b
No need to go any further because we observed 0 valid liver IVs at this point
Run a regression analysis to quantify the additional variance in NAFLD explained by VEGFB compared to TG alone
./get_vegfb_snp_textformat.sh
Rscript build_vegfb_tg_models.sh
Rscript write_supplement_table_vegfb_tg_models.R
cd ../../