SimSub

This is the implementation of "Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning" (PVLDB 2020).

Requirements

Linux Ubuntu OS (16.04 is tested)
Python (3.6 is tested, and Anaconda3 is recommended)
Tensorflow & Keras (1.8.0 and 2.2.0 are tested)
PyTorch (1.0+ is tested, and it is used for t2vec measure)

Please refer to the source code to install the required packages. You can install packages with conda install in a shell.

Dataset

Three real-world trajectory datasets are used in our paper. You can download the Porto and Harbin datasets here. For the sports dataset, download the dataset by requesting STATS Artificial Intelligence. Put the renamed files into ./data and the intermediate results generated by the algorithms will also be saved in this folder.

Preprocessing

Please refer to my colleague's repository t2vec to see detailed trajectory preprocessing. We notice the repository will preprocess the trajectories with downsampling for its purpose, but this part is not necessary for SimSub, which may cause your trajectories are relatively shorter. Recommend to discard it by setting the rate = 0.0. After you got .t (stored traj tokens as the input for t2vec) and .h5 (stored traj coordinates as the input for DTW and Frechet) files. you can run preprocess.py to generate trajectory pairs for the next training and testing.

python3 preprocess.py

For t2vec, please refer to the repository to prepare the model. Note that the implementation is an undirectional t2vec setting ('-bidirectional False' in t2vec.py), if you want to try bidirectional t2vec, you may revise the codes in distance.py. Similarly, if you want to try other similarity measurements, you should understand these measurements and revise the corresponding codes/interfaces that are embedded in our framework.

Running Procedures

Running ExactS

Run ExactS.py to see the exact results for this problem. We take the outputs by ExactS as ground-truth for further learning-based algorithms training. The outputs include the most similar subtrajectories and their similarities that will be stored in the folder ./data automatically.

python3 ExactS.py

Running SizeS

Run SizeS.py to see the approximate results for this problem. You can turn the parameter par, which provides a tradeoff between the effectiveness and efficiency.

python3 SizeS.py

Splitting-based Algorithms (PSS, POS, and POS-D)

Run Splitting_based.py to see the results for PSS, POS, and POS-D. You need to explicitly specify which algorithm you want to run by the parameter opt. For POS-D, you need to extra provide the delay steps delay_K in the codes.

opt = 'PSS' #POS, POS-D

python3 Splitting_based.py

For PSS, when the t2vec measure is used, you need to provide a reversed training t2vec model such as best_model_portoR.pt for computing the suffix similarities. Please refer to the t2vec repository to train the corresponding model by inputting the reversed trajectory sequences.

Training RLS and RLS-Skip

Dump the best subtrajectory similarities stored in SUBSIM file first into ./data by calling ExactS for validation. Run RLS_train.py and RLS_Skip_train.py for training RLS and RLS-Skip models, respectively. The generated models will be stored in the folder ./save automatically, and you can pick one model with the best performance on the validation data as your model from them.

python3 RLS_train.py

python3 RLS_Skip_train.py

Here, we provide an interface RL.load(checkpoint), and you can load an intermediate model to continue the training from the checkpoint, which saves your efforts caused by some unexpected exceptions and no need to train again. In addition, we use the python numpy to implement forward propagation for neural networks when the model is trained in rl_nn.py, which offers a faster efficiency.

Hyperparameters

There are several hyperparameters in rl_nn.py, you may try to turn these parameters for better performance when training. We conclude them as follows:

units, activation, learning_rate, discount_factor, reward_decay, and epsilon_min

Testing

We provide the testing codes for these algorithms in estimate.py

python3 estimate.py

Citing SimSub

Please cite our paper if you find this code is useful

@article{wang2020efficient,
  title={Efficient and effective similar subtrajectory search with deep reinforcement learning},
  author={Wang, Zheng and Long, Cheng and Cong, Gao and Liu, Yiding},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={12},
  pages={2312--2325},
  year={2020},
  publisher={VLDB Endowment}
}

derekwtian / SimSub