wooner49 / sofia

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers (ICDE'21)

Repository from Github https://github.comwooner49/sofiaRepository from Github https://github.comwooner49/sofia

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers (ICDE'21)

This repository contains the source code for the paper Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers, by Dongjin Lee and Kijung Shin, presented at ICDE 2021.

In this work, we propose SOFIA, an online algorithm for factorizing real-world tensors that evolve over time with missing entries and outliers. By smoothly and tightly combining tensor factorization, outlier detection, and temporal-pattern detection, SOFIA achieves the following strengths over state-of-the-art competitors:

  • Robust and accurate: SOFIA yields up to 76% and 71% lower imputation and forecasting error than its best competitors.
  • Fast: Compared to the second-most accurate method, using SOFIA makes imputation up to 935X faster.
  • Scalable: SOFIA incrementally processes new entries in a time-evolving tensor, and it scales linearly with the number of new entries per time step.

Datasets

Name Description Size Granularity in Time Processed Dataset Original Source
Intel Lab Sensor locations x sensor x time 54 x 4 x 1152 every 10 minutes Dataset Link
Network Traffic sources x destinations x time 23 x 23 x 2000 hourly Dataset Link
Chicago Taxi sources x destinations x time 77 x 77 x 2016 hourly Dataset Link
NYC Taxi sources x destinations x time 265 x 265 x 904 daily Dataset Link

Requirements

  1. Tensor Toolbox v3.1 for tensor computation.
    • Download and link the library.
  2. Optimization Toolbox for non-linear programming solver (fmincon function in Matlab).

Running Examples

We provide two running example codes for online tensor completion and forecasting, respectively.

  1. Online tensor completion
  2. Tensor forecasting

Supplementary Document

Please see supplementary

Reference

This code is free and open source for only academic/research purposes (non-commercial). If you use this code as part of any published research, please acknowledge the following paper.

@inproceedings{lee2021robust,
  title={Robust factorization of real-world tensor streams with patterns, missing values, and outliers},
  author={Lee, Dongjin and Shin, Kijung},
  booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
  pages={840--851},
  year={2021},
  organization={IEEE}
}

About

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers (ICDE'21)

License:GNU General Public License v3.0


Languages

Language:MATLAB 99.5%Language:M 0.5%