chus-chus / sketchModelling

Code supporting a series of experiments on the use of efficient sliding window sketches to aid in the modelling of time series.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License: MIT

Sketches for Time-Dependant Machine Learning

This repository contains source code supporting a series of experiments on how streaming (in particular sketches) techniques can aid in the modelling of time series. The results can be found in the paper.

Installation

pip install -i https://test.pypi.org/simple/ skcm

These are the sketches, deep learning models and functionalities included:

  • Exponential Histogram, capable of keeping track of the following statistics:

    • Binary Counter [1]
    • Sum
      • Positive integers [1]
      • Extension over positive real numbers (own)
    • Mean (positive real and real, the former more space efficient)
    • Variance (real) [2]
  • EHRNN A modified Elmann Network (RNN) that efficiently keeps track of hidden state statistics across multiple time resolutions via Exponential Histograms. Implemented in PyTorch.

  • Other utils

    • DataFrame sketch windower: returns a pandas.DataFrame with the results of applying a summarizing sketch over a/some windows (Exponential Histograms). Useful for obtaining descriptive statistics and summarization of data trends across time resolutions.
    • Format converters
      • csv to arff and viceversa
      • pandas.DataFrame to arff
        (arff is a data format used by ML frameworks such as Weka and MOA)

Citations

If you use this code in your research / application, please cite the current pre-print.

@misc{antonanzas2021sketches,
      title={Sketches for Time-Dependent Machine Learning}, 
      author={Jesus Antonanzas and Marta Arias and Albert Bifet},
      year={2021},
      eprint={2108.11923},
      archivePrefix={arXiv},
      URL={https://arxiv.org/abs/2108.11923},
      primaryClass={cs.LG}
} 

References

Ideas from these references have been used in the software:

[1] M. Datar et al. (2002). Maintaining Stream Statistics over Sliding Windows. Society for Industrial and Applied Mathematics, 31(6), 1794-1813.

[2] B. Babcock et al. (2003). Maintaining Variance and k-Medians over Data Stream Windows. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 22, 234-243.

About

Code supporting a series of experiments on the use of efficient sliding window sketches to aid in the modelling of time series.

License:MIT License


Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%