Marcnuth / AnomalyDetection

Twitter's Anomaly Detection in Pure Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best way to handle sec and ms level of granularity

hokiegeek2 opened this issue · comments

I see in the anomaly_detect_ts the period is not set when the granularity is either 'sec' or 'ms':

elif timediff.seconds > 0:
    granularity = 'sec'
    # Aggregate data to minutely if secondly
    data = data.resample('60s', label='right').sum()
else:
    granularity = 'ms'

This causes the error reported in #11 Questions: should data at the sec and ms level (1) be resampled to 'min' level of granularity and set the period=1440 (2) be unsupported with ValueError being thrown to interrupt processing, (3) provide both as a configurable option

I personally vote for (3). @Marcnuth @triciascully what do you think?

I vote for (3), and thanks for debugging this!

On a separate but related note, I've been manually running the code and setting the period to period = 96 because the data I'm working with is at a 15min interval - can you build in an option for configuration of the period as well, since people might be working with data that's every 10min, 15min, 30min, 2hrs, etc.?

Adding in code to do the following:

  1. enable resampling of both sec and ms to minute
  2. render resampling of both sec and ms to be configurable so that either the resampling to a minute granularity occurs or a ValueError is thrown

Ah thank you, John! Sorry for the delay, I've been traveling - do you still need this in an issue?

No worries! Yes, please do that so it can be tracked. I am currently testing the fix that resamples data with ms and sec granularity. It's taking awhile because I need to generate it from metricbeat.

Okay, pushed pull request.

@triciascully this is implemented and tested in the master branch of my fork

merged, thanks @Marcnuth