wangjianlongnba/transdim

transdim

Transportation data imputation (transdim).

Strategic aim
Tasks and challenges

[Missing data imputation] [Rolling traffic prediction]
What we do just now!
What we care about!
Overview
Selected References:

[Spatio-temporal forecasting] [Principal component analysis] [Guassian process] [Matrix factorization] [Bayesian matrix and tensor factorization] [Low-rank tensor completion] [Generative Adversarial Nets] [Variational Autoencoder] [Tensor regression] [Poisson matrix factorization] [Graph signal processing] [Graph neural network] [Missing data imputation]
Our Publications
License

Strategic aim

Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.

Tasks and challenges

Missing data imputation
- Random missing: each sensor lost their observations at completely random. (simple task)
- Fiber missing: each sensor lost their observations during several days. (difficult task)
Rolling traffic prediction
- Forecasting without missing values. (simple task)
- Forecasting with incomplete observations. (difficult task)

What we do just now!

add a framework indicating overall studies;

Framework: Tensor completion task and its framework including data organization and tensor completion, in which traffic measurements are partially observed.

define the problems clearly;
- Example: Traffic forecasting using matrix factorization models.

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

describe the core challenges intuitively;
list main contributions of these studies.

What we care about!

Best algebraic structure for data imputation.
The context of urban transportation (e.g., biases).
Data noise avoidance.
Competitive imputation and prediction performance.
Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Urban traffic speed data set of Guangzhou, China

Publicly available at our Zenodo repository!

(a) Time series of actual and estimated speed within two weeks from August 1 to 14.

(b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

Missing data imputation

Urban traffic speed data set (i.e., Guangzhou-data-set(Gdata)) registered traffic speed data from 214 road segments over two months (61 days from August 1 to September 30 in 2016) in Guangzhou, China. We organize the raw data into a time series matrix of (214, 8784). For tensor-based models, we use a third-order tensor (214, 61, 144) as input. Matrix based models are tested with the time series matrix (214, 8784).

We consider two common missing data scenarios (i.e., random missing (RM) and non-random missing (NM)). For RM, we simply remove certain amount of observed entries in the matrix randomly and use these entries as ground truth to evaluate RMSE. For NM, we apply correlated fiber missing experiment by randomly choosing certain amount (e.g., 40%) (location, day) combinations and removing the whole time series in each combination.

Model	Paper	Data set	Missing	RMSE	Our implementation
PMF	Salakhutdinov et al., 2007	Gdata	20%, RM	4.0909	Jupyter Notebook
GAIN	Yoon et al., 2018	Gdata	20%, RM	4.6718	Jupyter Notebook
PMF	Salakhutdinov et al., 2007	Gdata	40%, RM	4.2280	Jupyter Notebook
GAIN	Yoon et al., 2018	Gdata	40%, RM	5.1776	Jupyter Notebook
PMF	Salakhutdinov et al., 2007	Gdata	20%, NM	4.3575	Jupyter Notebook
GAIN	Yoon et al., 2018	Gdata	20%, NM	6.5500	Jupyter Notebook
PMF	Salakhutdinov et al., 2007	Gdata	40%, NM	4.4866	Jupyter Notebook
GAIN	Yoon et al., 2018	Gdata	40%, NM	6.9947	Jupyter Notebook

PMF: Probabilistic matrix factorization.
- The code1 and code2 have been adapted for our implementation.
GAIN: Generative Adversarial Imputation Nets.
- The code has been adapted for our implementation.
LocInt: local interpolation.
- This model considers local information from observations at the neighboring time slots of the missing values.
TRMF: Temporal regularized matrix factorization. [Matlab code is also available!]
- Alleviating hyperparameters setting is a rewarding way.
BGCP: Bayesian Gaussian CP decomposition. [Imputation example - Jupyter Notebook] [Matlab code is also available!]
BPMF: Bayesian probabilistic matrix factorization.
HaLRTC: High accuracy low rank tensor completion.

Selected references

Spatio-temporal forecasting
- Zheyi Pan, Yuxuan Liang, Junbo Zhang, Xiuwen Yi, Yong Yu, Yu Zheng, 2018. HyperST-Net: hypernetworks for spatio-temporal forecasting. arXiv.
- Truc Viet Le, Richard Oentaryo, Siyuan Liu, Hoong Chuin Lau, 2017. Local Gaussian processes for efficient fine-grained traffic speed prediction. arXiv.
- Yaguang Li, Cyrus Shahabi, 2018. A brief overview of machine learning methods for short-term traffic forecasting and future directions. ACM SIGSPATIAL, 10(1): 3-9.
- Bing Yu, Haoteng Yin, Zhanxing Zhu, 2017. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv. (appear in IJCAI 2018)
- Feras A. Saad, Vikash K. Mansinghka, 2018. Temporally-reweighted Chinese Restaurant Process mixtures for clustering, imputing, and forecasting multivariate time series. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. PMLR: Volume 84.
- Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu, 2018. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(6085).
- Zhengping Che, Sanjay Purushotham, Guangyu Li, Bo Jiang, Yan Liu, 2018. Hierarchical deep generative models for multi-rate multivariate time series. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR 80:784-793, 2018.
- Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla, 2018. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. arXiv.
- Wang, X., Chen, C., Min, Y., He, J., Yang, B., Zhang, Y., 2018. Efficient metropolitan traffic prediction based on graph recurrent neural network. arXiv.
- Peiguang Jing, Yuting Su, Xiao Jin, Chengqian Zhang, 2018. High-order temporal correlation model learning for time-series prediction. IEEE Transactions on Cybernetics, early access.
- Oren Anava, Elad Hazan, Assaf Zeevi, 2015. Online time series prediction with missing data. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 37: 2191-2199.
- Shanshan Feng, Gao Cong, Bo An, Yeow Meng Chee, 2017. POI2Vec: Geographical latent representation for predicting future visitors. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
- Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, Masatoshi Yoshikawa, 2012. Fast mining and forecasting of complex time-stamped events. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012).
- Yasuko Matsubara, Yasushi Sakurai, Willem G. van Panhuis, Christos Faloutsos, 2014. FUNNEL: automatic mining of spatially coevolving epidemics. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2014).
- Koh Takeuchi, Hisashi Kashima, Naonori Ueda, 2017. Autoregressive tensor factorization for spatio-temporal predictions. 2017 IEEE International Conference on Data Mining (ICDM 2017).
Principal component analysis
- Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, Shin Ishii, 2003. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19: 2088-2096. [Matlab code]
- Li Qu, Li Li, Yi Zhang, Jianming Hu, 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems, 10(3): 512-522.
- Li Li, Yuebiao Li, Zhiheng Li, 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation Research Part C: Emerging Technologies, 34: 108-120.
Guassian process
- Michalis K. Titsias, Magnus Rattray, Neil D. Lawrence, 2009. Markov chain Monte Carlo algorithms for Gaussian processes, Chapter.
- Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira, 2018. Multi-output Gaussian processes for crowdsourced traffic data imputation. IEEE Transactions on Intelligent Transportation Systems, early access. [Matlab code]
- Nicolo Fusi, Rishit Sheth, Huseyn Melih Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv. [Python code]
- Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, Guillermo Sapiro, 2012. Kernelized probabilistic matrix factorization: exploiting graphs and side information. [slide]
- John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani, 2017. Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv.
Matrix factorization
- Nikhil Rao, Hsiangfu Yu, Pradeep Ravikumar, Inderjit S Dhillon, 2015. Collaborative filtering with graph information: Consistency and scalable methods. Neural Information Processing Systems (NIPS 2015). [Matlab code]
- Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. [Matlab code]
- Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch, 2018. Network-wide crowd flow prediction of Sydney trains via customized online non-negative matrix factorization. In The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy.
Bayesian matrix and tensor factorization
- Ruslan Salakhutdinov, Andriy Mnih, 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland. [Matlab code (official)] [Python code] [Julia and C++ code] [Julia code]
- Ilya Sutskever, Ruslan Salakhutdinov, Joshua B. Tenenbaum, 2009. Modelling relational data using Bayesian clustered tensor factorization. NIPS 2009.
- Nicolo Fusi, Rishit Sheth, Melih Huseyn Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv.
- Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211-222.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751-1763.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv.
- Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David B. Dunsun, Lawrence Carin, 2014. Scalable Bayesian low-rank decomposition of incomplete multiway tensors. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China.
Low-rank tensor completion
- Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.
- Bin Ran, Huachun Tan, Yuankai Wu, Peter J. Jin, 2016. Tensor based missing traffic data completion with spatial–temporal correlation. Physica A: Statistical Mechanics and its Applications, 446: 54-63.
Generative Adversarial Nets
- Brandon Amos, 2016. Image completion with deep learning in TensorFlow. blog post. [github]
- Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. GAIN: missing data imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [supplementary materials] [Python code]
- Ian Goodfellow, 2016. NIPS 2016 tutorial: Generative Adversarial Networks.
- Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, Georg Langs, 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv.
- Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, Xiaojie Yuan, 2018. Multivariate time series imputation with generative adversarial networks. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
Variational Autoencoder
- Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, 2017. Factorized variational autoencoders for modeling audience reactions to movies. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.
- Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst, 2014. Matrix completion on graphs. arXiv. (appear in NIPS 2014)
- Rianne van den Berg, Thomas N. Kipf, Max Welling, 2018. Graph convolutional matrix completion. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), London, UK.
- Graph autoencoder - GitHub.
- Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. WWW 2018.
- John T. McCoy, Steve Kroon, Lidia Auret, 2018. Variational Autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine, 51(21): 141-146. [Python code] [VAE demo]
- Pierre-Alexandre Mattei, Jes Frellsen, 2018. missingIWAE: Deep generative modelling and imputation of incomplete data. Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montréal, Canada. related slide
Tensor regression
- Guillaume Rabusseau, Hachem Kadri, 2016. Low-rank regression with tensor responses. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
- Rose Yu, Yan Liu, 2016. Learning from multiway data: simple and efficient tensor regression. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Masaaki Imaizumi, Kohei Hayashi, 2016. Doubly decomposing nonparametric tensor regression. Proceedings of the 33 rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Rose Yu, Guangyu Li, Yan Liu, 2018. Tensor regression meets Gaussian processes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. [Matlab code]
- Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang, 2018. Boosted sparse and low-rank tensor regression. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
Poisson matrix factorization
- Liangjie Hong, 2015. Poisson matrix factorization. blog post.
- Ali Taylan Cemgil, 2009. Bayesian inference for nonnegative matrix factorisation models. Computational intelligence and neuroscience.
- Prem Gopalan, Jake M. Hofman, David M. Blei, 2015. Scalable recommendation with hierarchical poisson factorization. In UAI, 326-335. [C++ code]
- Laurent Charlin, Rajesh Ranganath, James Mclnerney, 2015. Dynamic Poisson factorization. Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Italy. [C++ code]
- Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee, 2017. Recurrent Poisson factorization for temporal recommendation. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, Nova Scotia Canada. [Matlab code]
Graph signal processing
- Arman Hasanzadeh, Xi Liu, Nick Duffield, Krishna R. Narayanan, Byron Chigoy, 2017. A graph signal processing approach for real-time traffic prediction in transportation networks. arXiv.
- Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, Pierre Vandergheynst, 2018. Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE, 106(5): 808-828. [slide]
Graph neural network
- How to do Deep Learning on Graphs with Graph Convolutional Networks (Part 1: A High-Level Introduction to Graph Convolutional Networks). blog post.
- Structured deep models: Deep learning on graphs and beyond. slide.
- gcn: Implementation of Graph Convolutional Networks in TensorFlow. GitHub project.
- gated-graph-neural-network-samples: Sample Code for Gated Graph Neural Networks. GitHub project.
- Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, Yan Liu, 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. AAAI 2019.
Missing data imputation
- Daniel J. Stekhoven, Peter Bühlmann, 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1): 112–118. [missingpy - PyPI] or [missingpy - GitHub]
- fancyimpute: A variety of matrix completion and imputation algorithms implemented in Python. [homepage]
- Dimitris Bertsimas, Colin Pawlowski, Ying Daisy Zhuo, 2018. From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research, 18(196): 1-39.
- Wei Cao, Dong Wang, Jian Li, Hao Zhou, Yitan Li, Lei Li, 2018. BRITS: Bidirectional Recurrent Imputation for Time Series. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. Python code

Our publications

Xinyu Chen, Zhaocheng He, Jiawei Wang, 2018. Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77.
Xinyu Chen, Zhaocheng He, Lijun Sun, 2019. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [Matlab code]

Please consider citing our papers if they help your research.

Our blog posts (in Chinese)

贝叶斯泊松分解变分推断笔记, by Yixian Chen (陈一贤).
变分贝叶斯推断笔记, by Yixian Chen (陈一贤).
贝叶斯高斯张量分解, by Xinyu Chen (陈新宇).
贝叶斯矩阵分解, by Xinyu Chen (陈新宇).

License

This work is released under the MIT license.

wangjianlongnba / transdim

transdim

Contents

Strategic aim

Tasks and challenges

Missing data imputation

Rolling traffic prediction

What we do just now!

What we care about!

Overview

Urban traffic speed data set of Guangzhou, China

Machine learning models

Selected references

Spatio-temporal forecasting

Principal component analysis

Guassian process

Matrix factorization

Bayesian matrix and tensor factorization

Low-rank tensor completion

Generative Adversarial Nets

Variational Autoencoder

Tensor regression

Poisson matrix factorization

Graph signal processing

Graph neural network

Missing data imputation

Our publications

Our blog posts (in Chinese)

License

About

Languages