Non-compliant Window Co-occurrence pattern mining in temporal data
- A Python library to find sequential association patterns in time-series data, co-occurring with target anomalous windows.
- Anomalous windows are sequences in which a target variable defies expected behavior (e.g. Emissions from a vehicle).
- The patterns are found in co-occurrence with a specific non-compliant feature (to decipher the reason behinds it's irregular behavior).
- The package uses various pruning methodologies to speed up pattern mining, and hashing for quick support count.
- The algorithm is based on research: (Discovering non-compliant window co-occurrence patterns)[https://link.springer.com/article/10.1007/s10707-016-0289-3]
from nwc_pattern_miner import mine_sequence_patterns
- series_df:
pd.DataFrame
; Input DataFrame (Only features [discretized] and Target [binarized] columns) - nc_window_col:
str
; Column Name with Binary Target (Anomalous Windows) - support_threshold:
float
; Support threshold for sequence co-occurrence patterns - crossk_threshold:
float
; Ripley's Cross-k threshold for sequence co-occurrence patterns - pattern_length:
int
; length of feature sequences co-occurring with anomalous windows - confidence_threshold:
float, default=-1
; Confidence threshold for sequence co-occurrence patterns - lag:
int, default= 0
; lag consideration between sequence patterns and anomalous windows - invalid_seq_indexes:
list, default=list()
; list of indexes across which sequence patterns would be invalidated - output_metric:
{'crossk', 'support'}, default='crossk'
; Metric used to sort patterns mined - output_type:
{'topk', 'threshold'}, default='topk';
Type of output for sequence patterns mined - output_threshold:
float, default= -1
; Threshold cutoff used to get output sequence patterns, ifoutput_type='threshold'
- topk:
int, default=100
; Top-k sequence patterns obtained based onoutput_metric
, ifoutput_type='topk'
- pruning_type:
str, default=apriori
; Between[apriori, br-dr]
, both have same run-time, 'br-dr' does more enumerations but enumeration speed is much faster due to UB pruning.
engrpm | EGRkgph | MSPhum | EngTq | NCWindow |
---|---|---|---|---|
9 | 11 | 5 | 3 | 1 |
3 | 1 | 5 | 4 | 0 |
engrpm | EGRkgph | MSPhum | EngTq | Count | Support | Kvalue | Confidence | First Occurrence Index |
---|---|---|---|---|---|---|---|---|
4 4 4 | 5 5 5 | 2 2 2 | 146 | 0.00528 | 2.377 | 1.0 | 47167 | |
4 4 4 | 7 7 7 | 250 | 0.00643 | 2.357 | 1.0 | 41984 |