Multinomial Propensity Score Trimming

This is the code repository for our upcoming paper on propensity score (PS) trimming in settings with multiple (3 or more) treatment groups. In this paper, we proposed definitions to extend three existing PS trimming methods (Crump et al. Biometrika 2009;96:187, Stürmer et al. Am J Epidemiol 2010;172:843, and Walker et al. Comp Eff Res 2013;3:11) for multinomial exposures.

Related resources

Paper: in press
ICPE 2018 poster: https://github.com/kaz-yos/icpe-2018-org-mode-poster
Web app: https://kaz-yos.shinyapps.io/shiny_trim_ternary/
Web app source code: https://github.com/kaz-yos/shiny-trim-ternary

Files and folders in this repo

*.R: Main R script files for generating simulation data, analyzing data, and reporting results. Execution each file will generate a plain text report file named *.R.txt under log/.
*_o2.sh: Example shell scripts for the Linux SLURM batch job system. These are designed for Harvard Medical School’s O2 cluster specifically, and are not expected to work without modification elsewhere.
data/: Folder for simulation data. Due to file size issues, only the summary file for running 06_assess_results.R is kept.
log/: Folder for log files.
out/: Folder for figure PDFs.

Simulation

Installation

Two custom R packages must be installed before running the R scripts provided in this repository.

## Install devtools (if you do not have it already)
install.packages("devtools")
## Install the data generation package
devtools::install_github(repo = "kaz-yos/datagen3")
## Install the simulation package
devtools::install_github(repo = "kaz-yos/trim3")

Additionally, packages doParallel, doRNG, grid, gtable, and tidyverse, are required in the scripts.

Data generation

Running the following will generate raw data files under the data/ folder using 8 cores.

Rscript ./01_generate_data.R 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./01_generate_data_o2.sh

Data preparation

Running the following will process the specified data file (PS estimation and trimming) and generate a new file with the same name except that raw changes to prepared.

Rscript ./02_prepare_data.R ./data/scenario_raw001_part001_r50.RData 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script. This will dispatch a SLURM job for each file.

sh ./02_prepare_data_o2.sh ./data/scenario_raw*

Data analysis

Running the following will analyze the specified data file (outcome model estimation) and generate a new file with the same name except that prepared changes to analyzed.

Rscript ./03_analyze_data.R ./data/scenario_prepared001_part001_r50.RData 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script. This will dispatch a SLURM job for each file.

sh ./03_analyze_data_o2.sh ./data/scenario_prepared*

Result aggregation

Running the following will aggregate all the analyzed data files under data/ and generate a single new file named all_analysis_results.RData.

Rscript ./04_aggregate_results.R 1

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./04_aggregate_results_o2.sh

Result summarization

Running the following will summarize the results (scenario-level summaries) and generate a new file named all_analysis_summary.RData.

Rscript ./05_summarize_results.R ./data/all_analysis_results.RData 1

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./05_summarize_results_o2.sh ./data/all_analysis_results.RData

Assessment

Running the following will create figures under out/ describing the summary statistics in all_analysis_summary.RData.

./Rscriptee ./06_assess_results.R ./data/all_analysis_summary.RData 1

Author

Kazuki Yoshida <kazukiyoshida@mail.harvard.edu>

kaz-yos / multinomial-ps-trimming

Multinomial Propensity Score Trimming

Related resources

Files and folders in this repo

Simulation

Installation

Data generation

Data preparation

Data analysis

Result aggregation

Result summarization

Assessment

Author

About

Languages