attention-lstm keras-tensorflow machine-learning rnn

Anomaly Detection Neural Network with Attention

This readme consists of four main parts to briefly describe the workflow of training a Recurrent Neural Network with attention layer model to classify anomaly events in a sequence based embeded software log. These four parts are Data Loading and Preprocessing, Model building, Model training, Results Analysis and Visualization. Check the python notebook for details.

Data loading and preprocessing

Load in all 15 .csv data files, and save as pandas dataframes.

Overview of the data

Groupby class and event column in the dataframe to get the occurrence count of different events under different class.

		clean-01	clean-02	clean-03	clean-04	clean-05	clean-06	clean-07	clean-08	clean-09	clean-10	fifo-ls-01	fifo-ls-02	fifo-ls-sporadic	full-while	half-while
class	event
COMM	MSG_ERROR	6.0	6.0	8.0	6.0	6.0	6.0	6.0	6.0	6.0	6.0	5715	5589.0	6006.0	509.0	422.0
	REC_MESSAGE	17968.0	17969.0	17963.0	18135.0	18134.0	18147.0	18216.0	18213.0	18260.0	18347.0	65232	66973.0	65666.0	44802.0	45072.0
	REC_PULSE	24710.0	24226.0	24173.0	24871.0	24849.0	24358.0	24390.0	24397.0	24442.0	24644.0	28312	28349.0	25631.0	39342.0	39529.0
	REPLY_MESSAGE	17947.0	17950.0	17938.0	18098.0	18103.0	18131.0	18190.0	18180.0	18248.0	18329.0	59477	61336.0	59627.0	44202.0	44565.0
	SIGNAL	NaN	1.0	2.0	NaN	NaN	1.0	NaN	1.0	1.0	2.0	37	36.0	39.0	NaN	1.0
	SND_MESSAGE	18089.0	18077.0	18073.0	18234.0	18235.0	18247.0	18300.0	18286.0	18373.0	18447.0	65378	67122.0	65808.0	45149.0	45426.0
	SND_PULSE	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	882	884.0	943.0	11226.0	11289.0
	SND_PULSE_DIS	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	877	880.0	958.0	NaN	NaN
	SND_PULSE_EXE	36701.0	48219.0	60157.0	72854.0	84809.0	96321.0	108339.0	120360.0	132390.0	144575.0	181312	193406.0	202078.0	172297.0	160365.0
CONTROL	BUFFER	2161.0	2158.0	2170.0	2224.0	2242.0	2243.0	2264.0	2278.0	2298.0	2329.0	4845	4938.0	4751.0	4326.0	4334.0

From above table, it could be seen that the clean and anomalous files are quite different based on the occurrence counts of different events. For example, normally event COMM-SND_MESSAGE occurced around 18000 times, while in the anomalous files it occured around 45000~67000 times. This may not be seen as an effective way to detect anomalous activity, however, it can show a general picture of the data where the anomaly could be residing.

Model Building

Load the encoder and decoder model

The architecture of this model is:

input of events sequence ------>> encoder(GRU unit) ------>> attention layer ----->> decoder (GRU unit) ------->> output layer

The input is a small segment of the log file, in this case, 5 continuous events, and the target output is the next 5 continuous events following the input one. The general idea is that using this proposed NN model to train inputs and predicting the following outputs. Assuming the event sequences patterns between the clean and anomalous ones are different, then the preciting/test accuracy should be different using the same model and trained weights.

Check model.py file for the details of encoder, attention, and decoder models.

Model Training

Check anomaly_detection_NN_train.ipynb for details.

Results

The next step is to predict results using the above model and trained weights of each layer (saved in sumitmodel_checkpoint folder).

The test inputs are processed using event sequence length Tx = 5, same as the trained data, while using stride stride = 5 instead of 2.

Save all the predicted result into .npy files for further analysis use.

Set anomaly creteria

As mentioned above, Assuming the event sequences patterns between the clean and anomalous ones are different, then the preciting/test accuracy should be different using the same model and trained weights.

In the following code, I use squence length of 1000 as one input sample, and use the above trained model to precited output, and then compare the precited output with target values to get the misclassification accuracy.

After predict outputs on all the 10 clean files, calculate the mean and variance of the misclassification accuracy. Finally, I set the criteria to be (mean + 3* standard_deviation).

Any 1000 events long sequence with misclassification rate higher than the criteria will be deemed as anomaly segment.

In this case, any misclaasification rate higher than 0.365 will be classified as anomaly event.

Visualize anomalous events

Normal sequences

Abnormal sequences A

Abnormal sequences B

Reference

O. M. Ezeme, Q. H. Mahmoud and A. Azim, "DReAM: Deep Recursive Attentive Model for Anomaly Detection in Kernel Events," in IEEE Access, vol. 7, pp. 18860-18870, 2019.
https://www.tensorflow.org/tutorials/text/nmt_with_attention

About

Detect anomalies from a embed system log using RNN with attention layer.

attention-lstm keras-tensorflow machine-learning rnn

Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%