This repository is the official implementation of the paper Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs which is the second part of the paper Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering.
The code is based on the code of the first paper, but was developed further such that it is more user-friendly. All experiments from the first part can be run with this repo as well (see Instructions for NJODE).
This code was executed using Python 3.7.
To install requirements, download this Repo and cd into it.
Then create a new environment and install all dependencies and this repo. With conda:
conda create --name njode python=3.7
conda activate njode
pip install -r requirements.txt
go to the source directory:
cd NJODE
generate Poisson Point Process dataset:
python data_utils.py --dataset_name=PoissonPointProcess --dataset_params=poisson_pp_dict
generate fractional Brownian Motion (FBM) datasets:
python data_utils.py --dataset_name=FBM --dataset_params=FBM_1_dict
python data_utils.py --dataset_name=FBM --dataset_params=FBM_2_dict
python data_utils.py --dataset_name=FBM --dataset_params=FBM_3_dict
python data_utils.py --dataset_name=FBM --dataset_params=FBM_4_dict
generate 2-dim correlated BM dataset:
python data_utils.py --dataset_name=BM2DCorr --dataset_params=BM_2D_dict
generate BM and its Square dataset:
python data_utils.py --dataset_name=BMandVar --dataset_params=BM_and_Var_dict
python data_utils.py --dataset_name=BM --dataset_params=BM_dict
generate BS with dependent observation intensity dataset:
python data_utils.py --dataset_name=BlackScholes --dataset_params=BS_dep_intensity_dict
generate double-pendulum (deterministic chaotic system) dataset:
python data_utils.py --dataset_name=DoublePendulum --dataset_params=DP_dict1
python data_utils.py --dataset_name=DoublePendulum --dataset_params=DP_dict2
python data_utils.py --dataset_name=DoublePendulum --dataset_params=DP_dict3
python data_utils.py --dataset_name=DoublePendulum --dataset_params=DP_dict3_test --seed=3
generate BM Filtering Problem dataset:
python data_utils.py --dataset_name=BMFiltering --dataset_params=BM_Filter_dict
python data_utils.py --dataset_name=BMFiltering --dataset_params=BM_Filter_dict_1
python data_utils.py --dataset_name=BMFiltering --dataset_params=BM_Filter_dict_testdata
generate BM TimeLag dataset:
python data_utils.py --dataset_name=BMwithTimeLag --dataset_params=BM_TimeLag_dict_1 --seed=3
python data_utils.py --dataset_name=BMwithTimeLag --dataset_params=BM_TimeLag_dict_2 --seed=3
python data_utils.py --dataset_name=BMwithTimeLag --dataset_params=BM_TimeLag_dict_testdata
generate Limit Order Book (LOB) dataset with raw data from Covario (ATTENTION: the needed raw data was provided by Covario but is not publicly available, hence this data generation will raise an error, unless you provide a working link to raw data):
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict1
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict1_2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict3
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict3_2
generate Limit Order Book (LOB) dataset with raw data from Kaggle data:
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_1
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_1_2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_3
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_3_2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_4
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_4_2
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_6
python data_utils.py --dataset_name=LOB --dataset_params=LOB_dict_K_6_2
REMARK: the physionet and climate datasets are automatically downloaded when training a model on them for the first time (=> do not train multiple models at parallel at first time, since otherwise each will start the download and overwrite each others output. after download is complete, parallel training can be used normally!)
go to the source directory:
cd NJODE
(parallel) training of models:
python run.py --params=... --NB_JOBS=... --NB_CPUS=... --first_id=...
List of all flags:
- params: name of the params list (defined in config.py) to use for parallel run
- NB_JOBS: nb of parallel jobs to run with joblib
- first_id: First id of the given list / to start training of
- get_overview: name of the dict (defined in config.py) defining input for extras.get_training_overview
- USE_GPU: whether to use GPU for training
- ANOMALY_DETECTION: whether to run in torch debug mode
- SEND: whether to send results via telegram
- NB_CPUS: nb of CPUs used by each training
- model_ids: List of model ids to run
- DEBUG: whether to run parallel in debug mode
- saved_models_path: path where the models are saved
- overwrite_params: name of dict (defined in config.py) to use for overwriting params
- plot_paths: name of the dict (in config.py) defining input for extras.plot_paths_from_checkpoint
- climate_crossval: name of the dict (in config.py) defining input for extras.get_cross_validation
- plot_conv_study: name of the dict (in config.py) defining input for extras.plot_convergence_study
train model on Poisson Point Process:
python run.py --params=param_list_poissonpp1 --NB_JOBS=1 --NB_CPUS=1 --first_id=1
python run.py --plot_paths=plot_paths_ppp_dict
train model on FBM:
python run.py --params=param_list_FBM1 --NB_JOBS=1 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_FBM1
python run.py --plot_paths=plot_paths_FBM_dict
train model on BM2DCorr:
python run.py --params=param_list_BM2D_1 --NB_JOBS=1 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_BM2D_1
python run.py --plot_paths=plot_paths_BM2D_dict
train model on BMandVar:
python run.py --params=param_list_BMandVar_1 --NB_JOBS=2 --NB_CPUS=1 --first_id=1
python run.py --plot_paths=plot_paths_BMVar_dict
train model on dependent observation intensity datasets:
python run.py --params=param_list_DepIntensity_1 --NB_JOBS=24 --NB_CPUS=1 --first_id=1
python run.py --plot_paths=plot_paths_DepIntensity_dict
train PD-NJODE on DoublePendulum:
python run.py --params=param_list_DP --NB_JOBS=40 --NB_CPUS=2 --first_id=1 --get_overview=overview_dict_DP
python run.py --plot_paths=plot_paths_DP_dict
python run.py --params=param_list_DP --NB_JOBS=40 --saved_models_path="../data/saved_models_DoublePendulum/" --overwrite_params="{'test_data_dict': 'DP_dict3_test'}"
python run.py --plot_paths=plot_paths_DP_dict
train model on BM Filtering Problem dataset:
python run.py --params=param_list_BMFilter_1 --NB_JOBS=8 --NB_CPUS=2 --first_id=1 --get_overview=overview_dict_BMFilter_1
python run.py --plot_paths=plot_paths_BMFilter_dict
train model on BM with TimeLag dataset:
python run.py --params=param_list_BMTimeLag --NB_JOBS=8 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_BMTimeLag
python run.py --plot_paths=plot_paths_BMTimeLag_dict
train model on PhysioNet datasets:
python run.py --params=param_list_physio --NB_JOBS=8 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_physio --crossval=crossval_dict_physio
train model on Climate datasets:
python run.py --params=param_list_climate --NB_JOBS=8 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_climate --crossval=crossval_dict_climate
train model on Limit Order Book datasets:
python run.py --params=param_list_LOB --NB_JOBS=22 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_LOB
retrain classifier on LOB dataset:
python run.py --params=param_list_retrainLOB --NB_JOBS=42 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_retrainLOB
train model on Limit Order Book Kaggle datasets:
python run.py --params=param_list_LOB_K --NB_JOBS=22 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_LOB_K
python run.py --params=param_list_LOB_n --NB_JOBS=22 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_LOB_n
python run.py --plot_paths=plot_paths_LOB_K_dict
retrain classifier on LOB Kaggle dataset:
python run.py --params=param_list_retrainLOB_K --NB_JOBS=42 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_retrainLOB_K
python run.py --params=param_list_retrainLOB_n --NB_JOBS=42 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_retrainLOB_n
training of the DeepLOB model from the paper DeepLOB: Deep Convolutional Neural Networks for Limit Order Books as comparison
# training of the DeepLOB model
python ../DeepLOB/train_DeepLOB.py
fitting of LinReg models as baselines on LOB data:
python LOB_linreg.py
This code can be used in accordance with the LICENSE.
If you find this code useful or include parts of it in your own work, please cite our papers:
Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs
@article{PDNJODE
url = {https://arxiv.org/abs/2206.14284},
author = {Krach, Florian and Nübel, Marc and Teichmann, Josef},
title = {Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEs},
publisher = {arXiv},
year = {2022},
}
Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering
@inproceedings{
herrera2021neural,
title={Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering},
author={Calypso Herrera and Florian Krach and Josef Teichmann},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=JFKR3WqwyXR}
}
This code is based on the code-repo of the first paper Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering:
https://github.com/HerreraKrachTeichmann/NJODE
Parts of this code are based on and/or copied from the code of:
https://github.com/edebrouwer/gru_ode_bayes, of the paper
GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series
and the code of: https://github.com/YuliaRubanova/latent_ode, of the paper
Latent ODEs for Irregularly-Sampled Time Series
and the code of: https://github.com/zcakhaa/DeepLOB-Deep-Convolutional-Neural-Networks-for-Limit-Order-Books/, of the paper
DeepLOB: Deep convolutional neural networks for limit order books.
The High Frequency Crypto Limit Order Book Data was made public in Kaggle. The other bitcoin LOB dataset was gratefully provided by Covario, but is not publicly available.
The original repository of the paper Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering is available here. In the current repository, the code was developed further such that it is more user-friendly. Below are the instructions to run the experiments of the first paper.
go to the source directory:
cd NJODE
generate the standard BlackScholes, Heston and OrnsteinUhlenbeck datasets:
python data_utils.py --dataset_name=BlackScholes --dataset_params=hyperparam_default
python data_utils.py --dataset_name=Heston --dataset_params=hyperparam_default
python data_utils.py --dataset_name=OrnsteinUhlenbeck --dataset_params=hyperparam_default
generate dataset of Heston model without Feller condition:
python data_utils.py --dataset_name=HestonWOFeller --dataset_params=HestonWOFeller_dict1
python data_utils.py --dataset_name=HestonWOFeller --dataset_params=HestonWOFeller_dict2
generate combined dataset OU+BS:
python data_utils.py --dataset_name=combined_OrnsteinUhlenbeck_BlackScholes --dataset_params=combined_OU_BS_dataset_dicts
generate sine-drift BS dataset OU+BS:
python data_utils.py --dataset_name=sine_BlackScholes --dataset_params=sine_BS_dataset_dict1
python data_utils.py --dataset_name=sine_BlackScholes --dataset_params=sine_BS_dataset_dict2
go to the source directory:
cd NJODE
train models on the 3 standard datasets (Black-Scholes, Heston, Ornstein-Uhlenbeck):
python run.py --params=params_list1 --NB_JOBS=3 --NB_CPUS=1 --first_id=1
run the convergence study on the 3 standard datasets:
# Black-Scholes
python run.py --params=params_list_convstud_BS --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --plot_conv_study=plot_conv_stud_BS_dict1
python run.py --plot_conv_study=plot_conv_stud_BS_dict2
# Heston
python run.py --params=params_list_convstud_Heston --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --plot_conv_study=plot_conv_stud_heston_dict1
python run.py --plot_conv_study=plot_conv_stud_heston_dict2
# Ornstein-Uhlenbeck
python run.py --params=params_list_convstud_OU --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --plot_conv_study=plot_conv_stud_OU_dict1
python run.py --plot_conv_study=plot_conv_stud_OU_dict2
train model on HestonWOFeller dataset:
python run.py --params=params_list_HestonWOFeller --NB_JOBS=2 --NB_CPUS=1 --first_id=1
train model on Combined OU+BS dataset (regime switch)
python run.py --params=params_list_combined_OU_BS --NB_JOBS=1 --NB_CPUS=1 --first_id=1
train model on sine-drift BS dataset (explicit time dependence)
python run.py --params=params_list_sine_BS --NB_JOBS=2 --NB_CPUS=1 --first_id=1
train the GRU-ODE-Bayes model on the 3 standard datasets:
python run.py --params=params_list_GRUODEBayes --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_GRUODEBayes
train model on climate dataset of GRU-ODE-Bayes:
python run.py --params=params_list_NJODE1_climate --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_NJODE1_climate --crossval=crossval_dict_NJODE1_climate
train model on Physionet Dataset of Latent ODEs for Irregularly-Sampled Time Series:
python run.py --params=params_list_NJODE1_physionet --NB_JOBS=32 --NB_CPUS=1 --first_id=1 --get_overview=overview_dict_NJODE1_physionet --crossval=crossval_dict_NJODE1_physionet