cumc-dbmi / cehr-bert

CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to finish tutorial: can run prediction evaluation

schuemie opened this issue · comments

I've been able to follow the tutorial in the README just fine, except for the last line.
I've downloaded the Synthea data, converted it to the training data format, and used this line to create a pre-trained model:

PYTHONPATH=./: python3 trainers/train_bert_only.py -i sample_data/ -o ~/Documents/omop_test/cehr-bert -iv -m 512 -e 1 -b 32 -d 5 --use_time_embedding 

As a result, I now have a file called 'bert_model_01_3.67.h5'

However, this last line is throwing an error:

PYTHONPATH=./: python3 evaluations/evaluation.py -a sequence_model -sd sample_data/hf_readmission -ef ~/Documents/omop_test/evaluation_train_val_split/hf_readmission/ -m 512 -b 32 -p 10 -vb ~/Documents/omop_test/cehr-bert -me vanilla_bert_lstm --sequence_model_name CEHR_BERT_512 --num_of_folds 4;

The error is

OSError: SavedModel file does not exist at: d:/omopSynthea/cehr-bert\bert_model.h5/{saved_model.pbtxt|saved_model.pb}

(I changed the path because I'm running on Windows). However, d:/omopSynthea/cehr-bert\bert_model.h5 exists (I renamed the aforementioned 'bert_model_01_3.67.h5'.

Am I doing something wrong? How do I run a 2nd epoch?

@schuemie Thanks for pointing this out, and it might have something to do with the path being misinterpreted on the windows machine, I only tested it on Linux. Let me try to reproduce it on a window machine.

The tutorial actually only pre-trains CEHR-BERT for 1 epoch due the argument passed into -e 1, you could change it to -e 2 when calling trainers/train_bert_only.py. Apologies this is not clearly stated in the README