cross-reference: find patients who have 1+ ECG in pre-event window AND 1+ ECG in post-event window
erikr opened this issue · comments
What
Enhance cross_reference
to find patients who have 1+ ECG in a pre-event window, and 1+ ECG in a post-event window, e.g. find patients with "paired" data.
Why
We often are only interested in patients who have 1+ ECG prior to some event, as well as 1+ ECG after some event.
Examples:
- initiation of immune checkpoint inhibitor therapy for cancer patients that can potentially damage the heart (
ecg-ici
, a project that we have not yet prioritized but would like to do in next few months) - aortic valve surgery that can potentially trigger or worsen arrhythmias (
sts-afib
project board)
How
New arguments --reference_start_time_tensor_paired
and --reference_end_time_tensor_paired
, would enable a user would call cross_reference
to find ECGs from patients who have 1+ ECG prior to a surgery, as well as 1+ ECG after the surgery:
./scripts/tf.sh -c -t \
${HOME}/ml/ml4cvd/recipes.py \
--mode cross_reference \
--tensors_name ecg \
--tensors /data/partners_ecg/mgh/explore/tensors_all_union.csv \
--time_tensor partners_ecg_datetime \
--reference_tensors /data/sts-afib/mgh-afib-after-avr-metadata.csv \
--reference_name sts-afib-after-avr \
--reference_join_tensors partners_ecg_patientid_clean \
--reference_join_tensors mrn \
--reference_start_time_tensor surgery_date -180 \
--reference_end_time_tensor surgery_date \
--reference_start_time_tensor_paired surgery_date \
--reference_end_time_tensor_paired surgery_date + 180 \
--output_folder $HOME \
--id sts-afib-ecg-crossref-180-days-preop
Acceptance Criteria
Above command runs cross_reference
to find patients who have 1+ ECG in pre-event window and 1+ ECG in post-event window, and quantify ECG coverage.
This is really a desire to find cross referenced data in multiple time windows. Instead of only allowing 2, allow any number of time windows by specifying reference_start/end_time_tensor
multiple times.
An additional augmentation will be to allow users to specify the number of data needed in each time window and which events in the time series to keep (newest/oldest/random)
arguments will probably look like this:
--mode cross_reference
--output_folder $HOME
--id sts-afib-ecg-crossref-180-days-preop
# Source Tensors
--tensors_name ecg
--tensors /data/partners_ecg/mgh/explore/tensors_all_union.csv
--join_tensor partners_ecg_patientid_clean
--time_tensor partners_ecg_datetime
# Reference Tensors
--reference_tensors /data/sts-afib/mgh-afib-after-avr-metadata.csv
--reference_name sts-afib-after-avr
--reference_join_tensors mrn
# Time Window 1
--reference_start_time_tensor surgery_date -180
--reference_end_time_tensor surgery_date
--number_in_window 1
--which_in_window newest
--window_name pre-op
# Time Window 2
--reference_start_time_tensor surgery_date
--reference_end_time_tensor surgery_date 180
--number_in_window 1
--which_in_window oldest
--window_name post-op
Output will likely change, details to follow during implementation
Can you clarify what these args do?
--number_in_window 1
--which_in_window newest
If they serve a key purpose, don't waste time explaining in a comment; better to just explain it in a docstring and point me to that line in the code :)
Can you clarify what these args do?
--number_in_window 1 --which_in_window newest
let's say for a patient 123, you had these data:
ecg 5/12
ecg 5/13
ecg 5/14
surgery 5/15
ecg 5/16
ecg 5/17
ecg 5/18
and you wanted to get the 1 newest pre-op ECG and the 2 oldest post-op ECG, so:
ecg 5/14
surgery 5/15
ecg 5/16
ecg 5/17
you can use args
# pre-op window
--window_name pre-op
--number_in_window 1
--which_in_window newest
# post-op window
--window_name post-op
--number_in_window 2
--which_in_window oldest