shreyashankar / streams

STREAMS: A Benchmark of Naturalistic Streaming Data for Online Continual Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Beijing Multi-Site Air Quality

shreyashankar opened this issue · comments

Beijing Multi-Site Air Quality
Time-series dataset of real-world pollution data, collected by 12 sensors around Beijing (each sensor is a domain).
https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data
X: history + timestamps to predict
[ tuples of (datetime, pollution level) of measurements already taken ] + [ (datetime, special token) ]
There can be up to 12 instances in each time step – one for each domain – so n_t will be small and T will be large.
Y: multi-dimensional pollution vector (SO2, NO2, CO, O3)
Domains:
station: name of the air-quality monitoring site
(Subpopulation) Shifts:
All kinds of shifts – covariate, label, and concept – are possible across the different domains
Covariate shift possible if dynamics of pollution are the same, just the history differs (can also cause label shift)
Concept Shift possible