AlexZ33 / Retrospective-study-on-healthcare-dataset-MIMIC-IV

This is a repo created for the purpose of group project submission for SPH5104, taken in NUS on AY22/23. In this project, we used real-world Electronic Medical Records to perform causal inference studies. A dataset was extracted from MIMIC-IV using SQL with data visualization and statistical analysis done in Python, R and STATA.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SPH5104 Analytics for Better Health

In this group project, we have done a retrospective study on Unfractionated Heparin and Enoxaparin Prophylaxis in relation to the occurrence of Venous Thromboembolism (VTE) among patients in the surgical intensive care unit. For more context, the written report for this project is uploaded to this repository.

The dataset was extracted from MIMIC-IV, a freely accessible electronic health record dataset: https://www.nature.com/articles/s41597-022-01899-x

Group Members

  1. Chanel Koh Xue Mei
  2. Huang Xining
  3. Ivan Lim Zhengyu
  4. Jin Qianyi
  5. Luu Hoang Huong
  6. Matthew Peh Wei Ern
  7. Wang Zihong

Brief description on the codes

  1. Covariate_extraction.sql: A SQL code that extracts the required covariates and outcomes from from the MIMIC-IV database.
  2. MIMIC-IV_dataprocessing_n_cleaning.ipynb: A jupyter-notebook to clean up the dataset and encoded the data to prepare it for model trainings and statistical analysis.
  3. Baseline_and_propensity_score_matching.ipynb: A jupyter-notebook to calculate baseline characteristics for Table 1 and adjust for confounder effects using Propensity score matching.
  4. Chi_Square_Fiosher.do: A STATA file for Chi-square and Fisher Exact tests for odds ratio calculations
  5. Conditional_log_regression: A jupyter-notebook for using conditional logistic regression to calculate odds ratio for a propensity score matched cohort.
  6. Survival_Analysis_km_cox.R: R code to perform survival analysis for the secondary outcome: In-hospital mortality rate
  7. Subgroup_analysis.R: R code for subgroup analysis for the primary outcome: Risk of VTE

A quick glance on the results and analysis


Data extraction



Data cleaning and processing



Propensity score matching with varying 1:N ratios. For our imbalanced dataset, the matching between the treatment group and the control group gets better when N is reduced.



A better propensity score matching also leads to small (SMD<0.25) effect sizes on the confounder covariates.



Survival Analysis: Kaplan-Meier survival curves for in-hospital mortality rate

About

This is a repo created for the purpose of group project submission for SPH5104, taken in NUS on AY22/23. In this project, we used real-world Electronic Medical Records to perform causal inference studies. A dataset was extracted from MIMIC-IV using SQL with data visualization and statistical analysis done in Python, R and STATA.


Languages

Language:Jupyter Notebook 99.0%Language:R 0.6%Language:Stata 0.4%