alexeyegorov / ConceptDrift

Algorithms for detecting changes from a data stream.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ADWIN: ADaptive sliding WINdow algorithm

ADWIN is a change detector and estimator that solves in a well-specified way the problem of tracking the average of a stream of bits or real-valued numbers.

ADWIN keeps a variable-length window of recently seen items, with the property that the window has the maximal length statistically consistent with the hypothesis “there has been no change in the average value inside the window”. More precisely, an older fragment of the window is dropped if and only if there is enough evidence that its average value differs from that of the rest of the window.

class concept_drift.adwin.Adwin
	(delta=0.002, max_buckets=5, min_clock=32, min_length_window=10, min_length_sub_window=5)
Parameters:
delta: Confidence value
max_buckets: Max number of buckets which have same number of original date in one row
min_clock Min number of new data for starting to reduce window and detect change
min_length_window Min window's length for starting to reduce window and detect change
min_length_sub_window Min sub window's length for starting to reduce window and detect change

Methods

set_input(value)

Set input value to the drift detector - ADWIN.

Parameters:
Value: Input value
Return:
Boolean Whether has drift

Example

from concept_drift.adwin import Adwin

adwin = Adwin()
for i in range(1000):
    if adwin.set_input(i):
	print("Here is a drift")

Page-Hinckley Test

The Page-Hinkley test which is a sequential adaptation of the detection of an abrupt change of the average of a Gaussian signal. It allows efficient detection of changes in the normal behaviour of a process which is established by a model.

 class concept_drift.page_hinkley.PageHinkley
 	(delta_=0.005, lambda_=50, alpha_=1 - 0.0001)
Parameters:
delta_: Magnitude threshold; the minimal absolute value of the amplitude of the jump to be detected
lambda_: Detection threshold
alpha_ Adaption variable for lambda adjustment

Test

Used the elecNormNew dataset;
Used GaussianNB as based classification;
Used accuracy_score as input date for change detector;
Used the prequential evaluation;

GaussianNB :
Mean acc within the window 1000: 0.7289912189511405

Page-Hinkley :
Drift detection: 8
Mean acc within the window 1000: 0.7342849306584686

ADWIN :
Drift detection: 139
Mean acc within the window 1000: 0.7496421003738032

About

Algorithms for detecting changes from a data stream.


Languages

Language:Python 100.0%