SPH5104 Analytics for Better Health

In this group project, we have done a retrospective study on Unfractionated Heparin and Enoxaparin Prophylaxis in relation to the occurrence of Venous Thromboembolism (VTE) among patients in the surgical intensive care unit. For more context, the written report for this project is uploaded to this repository.

The dataset was extracted from MIMIC-IV, a freely accessible electronic health record dataset: https://www.nature.com/articles/s41597-022-01899-x

Group Members

Chanel Koh Xue Mei
Huang Xining
Ivan Lim Zhengyu
Jin Qianyi
Luu Hoang Huong
Matthew Peh Wei Ern
Wang Zihong

Brief description on the codes

Covariate_extraction.sql: A SQL code that extracts the required covariates and outcomes from from the MIMIC-IV database.
MIMIC-IV_dataprocessing_n_cleaning.ipynb: A jupyter-notebook to clean up the dataset and encoded the data to prepare it for model trainings and statistical analysis.
Baseline_and_propensity_score_matching.ipynb: A jupyter-notebook to calculate baseline characteristics for Table 1 and adjust for confounder effects using Propensity score matching.
Chi_Square_Fiosher.do: A STATA file for Chi-square and Fisher Exact tests for odds ratio calculations
Conditional_log_regression: A jupyter-notebook for using conditional logistic regression to calculate odds ratio for a propensity score matched cohort.
Survival_Analysis_km_cox.R: R code to perform survival analysis for the secondary outcome: In-hospital mortality rate
Subgroup_analysis.R: R code for subgroup analysis for the primary outcome: Risk of VTE

A quick glance on the results and analysis

Data extraction

Data cleaning and processing

Propensity score matching with varying 1:N ratios. For our imbalanced dataset, the matching between the treatment group and the control group gets better when N is reduced.

A better propensity score matching also leads to small (SMD<0.25) effect sizes on the confounder covariates.

Survival Analysis: Kaplan-Meier survival curves for in-hospital mortality rate

About

This is a repo created for the purpose of group project submission for SPH5104, taken in NUS on AY22/23. In this project, we used real-world Electronic Medical Records to perform causal inference studies. A dataset was extracted from MIMIC-IV using SQL with data visualization and statistical analysis done in Python, R and STATA.

Languages

Language:Jupyter Notebook 99.0%Language:R 0.6%Language:Stata 0.4%