Neural network for predicting template relevance a la Segler and Waller's Neural Symbolic paper.
- RDKit (most versions should be fine)
- numpy
- RDKit (most versions should be fine)
- tensorflow (r0.12.0)
- h5py
- numpy
Learn to predict template relevance.
-
Grab reaction precedents from templates stored in MongoDB
python scripts/get_reaxys_data.py
-
Calculate fingerprints and store in .h5 file
python scripts/make_data_file.py data/reaxys_limit1000000000_reaxys_v2_transforms_retro_v9_10_5.txt 2048
-
Train model
python retrotemp/nntrain_fingerprint.py -t data/reaxys_limit1000000000_reaxys_v2_transforms_retro_v9_10_5.txt -o 163723 -m models/6d3M_Reaxys_10_5 --fp_len 2048
-
Find best validation performance
regex="model\.(.*)\.meta" for f in `ls -tr models/6d3M_Reaxys_10_5/*.meta` do if [[ $f =~ $regex ]] then ckpt="${BASH_REMATCH[1]}" echo $ckpt python retrotemp/nntrain_fingerprint.py -o 163723 -m models/6d3M_Reaxys_10_5 --fp_len 2048 -c "$ckpt" -t data/reaxys_limit1000000000_reaxys_v2_transforms_retro_v9_10_5.txt --test valid fi done
-
Retrain on whole dataset (?) for same number of epochs. Note: this is because we want a high-performing deployed model and no longer need to hold out any data.
python retrotemp/nntrain_fingerprint.py -t data/reaxys_limit1000000000_reaxys_v2_transforms_retro_v9_10_5.txt -o 163723 -m models/6d3M_Reaxys_10_5 --fp_len 2048 --fixed_epochs_train_all 15
-
Run standalone tensorflow version to dump to numpy arrays