The code for Dynamic Measurement Scheduling for Event Forecasting using Deep RL (ICML 2019)

Python code for reproducing the paper's result. https://arxiv.org/abs/1901.09699

Citation

@article{chang2018dynamic,
  title={Dynamic measurement scheduling for adverse event forecasting using deep RL},
  author={Chang, Chun-Hao and Mai, Mingjie and Goldenberg, Anna},
  journal={arXiv preprint arXiv:1812.00268},
  year={2018}
}

Requirements

In general, we make liberal use of the package of

tensorflow (we use version 1.9)
numpy
pandas
sckit-learn

Dataset

For MIMIC privacy concern, we could not directly share preprocessed data. We provide a mocked data file to illustrate the format of the data. If you want to use MIMIC, please Use our preprocessing code here:

https://github.com/zzzace2000/mimic-preprocess

Mocked data

We provide a mocked data under data/mocked_data.pkl. It's generated by the notebook data/gen_mocked_data.ipynb

Data format

We do not discretize our dataset and represent the measurement as a sparse array. This repo gives a dummy pickle file who has the same format as the real preprocessed data under "data/dummy.pkl". Let N be number of patients. Our picklie file composes of a python dictionary with key as follows:

'icustay_ids' -> A list of integer with length N. It represents each patient's icustay_ids. E.g. [1667, 221]
'Ts': A list that has N list inside. It represents the actual event time for N patients, from small to large. [[1., 1.5, 2.], [0.5, 0.8, 1.3, 1.5, 1.7]]
'ind_kts': A list that has N list inside. This is the index corresponding to the time array Ts. Indicating when this measurement happens. Sorted by increasing time. E.g. [[2], [0, 4]]
'ind_kfs': A list that has N list inside. This is the feature index saying which feature it is. For example, index 2 means Anion gap. E.g. [[1], [35, 2]]
'Ys': A list that has N list inside. It's the measurement value. We already normalized it with mean 0 variance 1. [[0.1], [-0.5, 0.7]]
'rel_end_time': A list of floats. Indicating which hour this trajectory ends, relative to the patient's entry time to the ICU. E.g. [34.16, 24.15]
'feature_names': A list of string. Indicating the feature name for each feature index. We have 39 time-series features included here, and the ind_kfs corresponds to this array. E.g. ['Albumin', 'Calcium (total)']
'covs': npy 2d array with shape as (N, X), where X is the number of covariates you use. We use 38 covariates (static features) such as age, gender and comorbidity. E.g. [[-0.6, 1], [0.5, 0]]
'covs_names': A list of string with length X. The name for each covariate. E.g. ['Age', 'Gender']
'test_set_idxes': A list of integer. These are the index of trajectories splitted to test set. Note there's no patient overlap between train and test set. E.g. [0]
'test_indicators': A 1d boolean numpy array of length N. It indicates which trajectory belongs to test set. E.g. [True, False]
'train_set_idxes': A list of integer. These are the index of trajectories that's in the training set. E.g. [1]
'labels': A list of integer. if the patient dies in the end, the label is 1, o.w. it's 0. E.g. [1, 0]

We have total N = 41,744 icu stays.

I. Build a classifier

In our paper, we first build a pre-trained classifier that predicts the mortality of the patient within 24 hours.

The codes for train and test a classifier is under folder classifier_training_and_evaluation/

1. train_mgp_rnn.py

The main file for training a classifier. This code supports 3 kinds of classifier MGP-RNN (Futoma et al. 2018), RNN + survival model and normal RNN. In our paper we only use RNN (since it's way faster, and RNN is good enough in out task), but we do find using MGP-RNN can boost our classifier's performnace quite a bit in various settings.

1.1 Data generation hyperparameter

'database_dir': the database directory
'database': the file name but without the suffix .pkl
'num_hours_pred': how many hours to forecast if event happens within this region. Default: 24 (hrs)

E.g. If I have a pickle file in '../data/my-mortality/39_feats_38_covs.pkl' then I would set database_dir as '../data/my-mortality/' and database as '39_feats_38_covs'

Then we generate the datasets online, and you can specify the following parameters to get different time-series dataset. We take the time interval between (-include_before_death, -before_end] (value 0 is the end of the trajectory). Then start with time "-before_end", we select the time point backward in time with interleaving of "data_interval" hours until it's smaller than "-include_before_death".

E.g. for the last 24 hours of patient, we want to evaluate 3 hours interval each, leading to total 9 points, we can set the default setting like the following:

'before_end': 0.
'include_before_death': 24.01
'data_interval': 3

Additionally our data loader generates the input time Xs to RNN. E.g. if we want to take the 24 points as RNN input with 1 hour seperate each, then we specify:

'X_interval': This specifies the time interval of RNN. Default is 1 hour.
'num_X_pred': This specifies the number of point feeding into RNN.

Finally, other parameters for data generation:

'neg_subsampled': Randomly subsample the negative case to let it has the same number of positive case in each training epoch. It handles the label imbalance problem (in our case roughly 8% is positive) and speeds up training. Default: True
'val_ratio': How many percentage of training data is splitted as validation set. Default: 0.15

1.2 RNN hyperparameter

For the RNN, the following parameter matters:

'num_features': What is the number of features for time-series (39 in our case)
'n_covs': number of static covariates. (38 in our case)
'l2_penalty': weights l2 penalty for RNN.
'lr': learning rate.
'num_hidden': number of hidden node used in RNN
'num_layers': how deep the RNN is.
'rnn_input_keep_prob': the dropout (keep) rate for input into RNN.
'rnn_output_keep_prob': the dropout (keep) rate for RNN output
'rnn_state_keep_prob': the dropout (keep) rate for RNN state to state transition
'rnn_imputation': choice from 'mean_imputation' or 'forward_imputation'. Default is 'mean_imputation'.
'metric': choose from 'auroc' or 'aupr'. Determines which metric is used to do validation early stopping. Default: 'auroc'
'n_classes': default 2. It's a 2 class classification.
'add_missing': Append the missingness indicator for each time-variant features to feed into RNN. Default: True.
'rnn_cls': choice from ManyToOneRNN, ManyToOneMGP_RNN, ManyToOneRNN_Survival. Default: ManyToOneRNN

1.3 Other general training hyperparameter

'identifier': This will be added in front of the model's folder name. Default: 'debug'
'seed': Set the seed for whole program. Default: 0
'output_dir': Where the model is stored. Default: '../models/'
'overwrite': If 1 and it finds the same model folder name, it will remove the folder and retrain. O.W. it would just exit the run and print a message.
'training_iters': The maximum number of epochs to train.
'eval_test': Evaluate and print the test set perfomance in each training epoch. Default: False
'batch_size': training batch size. Default: 64
'lookahead': early stopping patience. If there is no improvement of validation metric over 3 rounds than stop. Default: 3

We also support random search for hyperparameters:

'num_random_run': Do a random search of the hyperparameters. Please see the code block in the end of this file to specify what range you want to search for each RNN hyperparameter. If num_random_run is bigger than 0, do such random search. It will output a csv file summarizing all the performances and hyperparameters under the folder performance/.

Example on mocked data

You can just run:

python train_mgp_rnn.py \
    --database_dir ../data/ \
    --database mocked_data \
    --num_features 5 --n_covs 2 \
    --num_hours_pred 1 \
    --num_hours_warmup 0 \
    --min_measurements_in_warmup 0 \
    --neg_subsampled 0

We keep the other parameters the same as our actual experiment. You can take a look at the resulting summary file in classifier_training_and_evaluation/performance/debug_ManyToOneRNN_MIMIC_window.csv. We also keep our random search result as ManyToOneRNN_MIMIC_window.csv.

2. test_baselines.py

We use it to generate two baselines' performance: LR and RF. E.g. run a RF on the same dataset:

python test_baselines.py \ 
    --classifier RF --add_missing 1 \
    --database_dir ../data/ \
    --database mocked_data \
    --num_features 5 --n_covs 2 \
    --num_hours_pred 1 \
    --num_hours_warmup 0 \
    --min_measurements_in_warmup 0 \
    --neg_subsampled 0

II. Cache the RL experience tuple

All the files are under folder database/

After we have our classifier, we can generate the experience for our underlying reinforcement learning agent. To avoid the computation burden, we first cache the experience into a .tfrecord file, and later use it to train our RL.

1. Generate training experience (MIMIC_discretized_exp.py)

As we do not know the measurement order happening at the same time, we randomly generate different order of the experience data (please see paper for details). In our experiment we randomly generate at most 20 different orders for data at the same time and cache them. These files are called train/val/test.tfrecords.

We set the MIMIC_exp_cls as MIMIC_discretized_joint_exp_random_order_rnn.

for cache_file in train val test
do
python -u MIMIC_discretized_exp.py \
    --mode ${cache_file} \
    --identifier 0117-30mins-24hrs-20order-rnn-neg_sampled \
    --MIMIC_exp_cls MIMIC_discretized_joint_exp_random_order_rnn \
    --num_random_order 20 --include_before_death 24 \
    --min_hours_of_patient 12 --num_hours_pred 24.01 \
    --batch_size 1600 --database_name mingjie_39features_38covs \
     --cache_type neg_sampled --mgp_rnn_cls ManyToOneRNN \
     --mgp_rnn_dir ../models/0117-24hours_39feats_38cov_negsampled_rnn-mimic-nh128-nl2-c1e-07-keeprate0.9_0.7_0.5-npred24-miss1-n_mc_1-MIMIC_window-mingjie_39features_38covs-ManyToOneRNN/ \
     --cache_proportion 1 \
     --MIMIC_cache_exp_cls MIMIC_cache_discretized_joint_exp_random_order \
     &> ../logs/0118-rnn-cache-neg-sampled-${cache_file}.log &
done

2. Generate environment experience (MIMIC_discretized_exp.py)

To train an off-policy policy evaluation (OPPE) model, we cache the experience into train/val/test_per_time_env.tfrecords. Instead of generating experience between measurements, these experiences generate the state transition across time, which might span multiple measurements.

We set the MIMIC_exp_cls as MIMIC_per_time_env_exp_rnn.

for cache_file in val test train
do
python -u MIMIC_discretized_exp.py \
    --mode ${cache_file} \
    --identifier 0117-30mins-24hrs-20order-rnn-neg_sampled \
    --MIMIC_exp_cls MIMIC_per_time_env_exp_rnn \
    --num_random_order 20 \
    --include_before_death 24 \
    --min_hours_of_patient 12 \
    --num_hours_pred 24.01 \
    --batch_size 1600 \
    --database_name mingjie_39features_38covs \
    --cache_type neg_sampled \
    --mgp_rnn_cls ManyToOneRNN \
    --mgp_rnn_dir ../models/0117-24hours_39feats_38cov_negsampled_rnn-mimic-nh128-nl2-c1e-07-keeprate0.9_0.7_0.5-npred24-miss1-n_mc_1-MIMIC_window-mingjie_39features_38covs-ManyToOneRNN/ \
    --cache_proportion 1 \
    --MIMIC_cache_exp_cls MIMIC_cache_discretized_exp_env_v3 \
    &> ../logs/0117-rnn-cache-all-${cache_file}_per_time_env.log &
done

III. Train RL

After we cache the RL experiences, we can train our RL agent. We use random search across various architecture and different action cost coefficient to search for different policies.

For example:

for seed in 300 200 100; do
    for ac in 0 1e-4 5e-4 1e-3 5e-3 1e-2; do
    python -u run_sequential_dqn.py --my_identifier 2_0121 \
    --dqn_cls SequencialDuelingDQN \
    --rand 1 \
    --num_random_run 30 \
    --seed ${seed} \
    --replace 0 \
    --debug 0 \
    --rl_state_dim 167 \
    --action_dim 40 \
    --gamma 0.95 \
    --normalized_state 1 \
    --cache_cls MIMIC_cache_discretized_joint_exp_random_order_with_obs \
    --cache_dir ../RL_exp_cache/0312-30mins-24hrs-20order-rnn-neg_sampled-with-obs/ \
    --action_cost_coef ${ac} \
    --pos_label_fold_coef 1 \
    --only_pos_reward 0 \
    --depend_on_labels 0 \
    --train_batch_size 64 \
    &> ../logs/0314_24hrs_dqn_ac${ac}_rand_s${seed} &
    done
done

You can see our resulting csv file in estimation/2_0121_val_regression_summary.csv

IV. Train Off-policy policy evaluation regression model

We directly predict the probability changes from the state and action pair. We use the the per_time_env experience cache (II. 2) to train our model. This model is a simple feed forward neural network. We also use random search to get the best OPPE model, and the performance is also stored as a csv file under the folder policy_training_and_evaluation/estimation/

for seed in 10 20 30; do
    for model_type in "StateToProbGainPerTimeEstimator"; do
        echo ${model_type}
        python -u run_reward_estimator.py --seed ${seed} \
        --identifier 0121_with_larger_training_ \
        --model_type ${model_type} --normalized_state 1 \
        --rand 1 --num_runs 20 --max_training_iters 100 \
        --cache_dir ../RL_exp_cache/0121-30mins-24hrs-20order-rnn-neg_sampled/ \
        &> ../logs/0121_random_search_h_to_p_${model_type}_s${seed} &
    done
done

Please see the csv file in estimation/0121_with_larger_training__StateToProbGainPerTimeEstimator.csv to see the random search results.

V. Evaluate the policy

To evaluate the policy, set the mode as 3.

reward_dir="../models/0121_with_larger_training_-StateToProbGainPerTimeEstimator-0121-hl1-hu64-lr0.001-reg0.0001-kp0.7-n1/"
policy_dir="../models/dqn_mimic-0120_24hrs_rand_ac_and_arch_-g1-ac1.0e+00-gamma0.95-fold1.0-only_pos0-sd167-ad40-nn-10000-4-1-256-lr-0.01-reg-0.01-0.5-s-256-5000-i-50-500-3-1/"
python ./run_value_estimator_regression_based.py \
    --policy_dir ${policy_dir} --identifier 2_0121_ \
    --mode 3 \
    --cache_dir ../RL_exp_cache/0121-30mins-24hrs-20order-rnn-neg_sampled \
    --reward_estimator_dir ${reward_dir}

Please see the performance of resulting csv file policy_training_and_evaluation/2_0121_val_regression_summary.csv

If you want to estimate random policy performance, we can set the mode to 4. "time_pass_freq_scale_factor" sets how frequent the random policy does. Here is an example to vary it from 0 to 1:

reward_dir="../models/0121_with_larger_training_-StateToProbGainPerTimeEstimator-0121-hl1-hu64-lr0.001-reg0.0001-kp0.7-n1"
policy_dir="../models/dqn_mimic-0120_24hrs_rand_ac_and_arch_-g1-ac1.0e-02-gamma0.95-fold1.0-only_pos0-sd167-ad40-nn-10000-2-1-64-lr-0.001-reg-0.0001-0.7-s-256-5000-i-50-500-3-1/"
for time_pass_freq_scale_factor in 0 0.05 0.1 0.2 0.3 0.5 0.8 1; do
    python ./run_value_estimator_regression_based.py \
    --identifier 0121_per_time_rand_new_rew_est2 --mode 4 \
    --cache_dir ../RL_exp_cache/0117-30mins-24hrs-20order-rnn-neg_sampled \
    --reward_estimator_dir ${reward_dir} \
    --policy_dir ${policy_dir} \
    --time_pass_freq_scale_factor ${time_pass_freq_scale_factor} &
done

Please see the performance of resulting csv file policy_training_and_evaluation/2_0122_random_policy__random_policy_evaluation.csv

VI. evaluation

For the off policy evaluation figure (Fig. 4, 5), please see the notebook under notebooks/0117_depends_on_labels_DQN_per_time_hyperameter_selection.ipynb.

For the quailitative interpretation of the policy, please see the notebook notebooks/0117 visualization of policy.ipynb.

Simulation part

Please email my coauthor Mingjie Mai (mingjie.mai [at] mail.utoronto.ca) as he does not have time to write README. All the codes are under folder simulated_database/

Questions

Please email me (Chun-Hao Kingsley Chang) to kingsley.chang [at] mail.utoronto.ca or just open a github issue. I will try my best to help you reproduce the result or share the pretrained model.

zzzace2000 / autodiagnosis