SNTSVV / LogCleaner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LogCleaner

LogCleaner is a tool for automatically filtering noisy log messages by analyzing their periodicity and dependency. This repository contains the replication package for the following paper:

Donghwan Shin, Domenico Bianculli, and Lionel Briand. "Effective Removal of Operational Log Messages: an Application to Model Inference." arXiv preprint arXiv:2004.07194 (2020).

Authors

Licensing Information

LogCleaner is © 2020-2022 University of Luxembourg and licensed under the Apache v2 license.

The sample log under /dataset_sample is originated from LogHub and preprocessed by LogPrep; they are freely available and distributable for research purposes.

Prerequisite

Setup the Python virtual environment (only once):

python3 -m venv venv
source ~/venv/bin/activate
pip install -r requirements.txt  # NOTE: requirements.txt includes the list of dependent components

Next time, you only need to activate venv by:

source ~/venv/bin/activate

Input Preparation (Simplified)

  • LogCleaner takes as input logs and templates in the form of a structured log file generated by LogPrep.
  • An example header of a structured log file (csv): logID,lineID,time,level,message,tid,template,values.

Run LogCleaner

Use ./LogCleaner.py to run LogCleaner.

(venv) donghwan.shin@MP0500 LogCleaner % python LogCleaner.py -h
usage: LogCleaner.py [-h] --log LOG [--log_size LOG_SIZE] [--periodicity_only] [--p_threshold P_THRESHOLD]
                     [--dependency_only] [--save_log]

mandatory arguments:
  -l, --log             structured log file

optional arguments:
  -h, --help            show this help message and exit
  -ls LOG_SIZE, --log_size LOG_SIZE
                        Log size; if less than or equal to 1 then interpreted
                        as percentage (default: all)
  --periodicity_only    run periodicity analysis only
  --p_threshold         periodicity threshold (default: 5, meaning 5% error is tolerable)
  --dependency_only     run dependency analysis only
  --save_log            save the execution log

For example, running python LogCleaner.py -l dataset_sample/Hadoop/Hadoop_structured_logs.csv returns the following:

2022-06-21 18:49:49,310 - src.utils.common - INFO - Total number of logs: 68
2022-06-21 18:49:49,310 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:49,310 - src.main.operational_handler - INFO - periodicity_only=False, dependency_only=False, periodicity_threshold=5, boundary=0, power=1, min_supp=2
2022-06-21 18:49:49,310 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:49,311 - src.main.operational_handler - INFO - Total number of executions: 68
2022-06-21 18:49:49,311 - src.main.operational_handler - INFO - Total number of templates: 41
2022-06-21 18:49:49,311 - src.main.operational_handler - INFO - Total number of log entries: 3575
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E3 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E4 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E5 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E7 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E8 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078, 64792021078]
2022-06-21 18:49:49,312 - src.main.operational_handler - INFO - tid=E9 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,313 - src.main.operational_handler - INFO - tid=E25 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,313 - src.main.operational_handler - INFO - tid=E27 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,313 - src.main.operational_handler - INFO - tid=E29 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021076]
2022-06-21 18:49:49,313 - src.main.operational_handler - INFO - tid=E31 is not globally continuous in ex_id=1
2022-06-21 18:49:49,313 - src.main.operational_handler - INFO - tid=E37 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,314 - src.main.operational_handler - INFO - tid=E42 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021076]
2022-06-21 18:49:49,314 - src.main.operational_handler - INFO - tid=E43 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,314 - src.main.operational_handler - INFO - tid=E47 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,314 - src.main.operational_handler - INFO - tid=E48 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,314 - src.main.operational_handler - INFO - tid=E49 is not globally continuous (occurred less than three times); ex_id=2, sequence=[64792021082]
2022-06-21 18:49:49,315 - src.main.operational_handler - INFO - tid=E52 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,315 - src.main.operational_handler - INFO - tid=E53 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,315 - src.main.operational_handler - INFO - tid=E55 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,315 - src.main.operational_handler - INFO - tid=E61 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021076]
2022-06-21 18:49:49,315 - src.main.operational_handler - INFO - tid=E63 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E64 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E66 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E67 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E68 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E69 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,316 - src.main.operational_handler - INFO - tid=E70 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E71 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E73 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E75 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021077]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E76 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021077]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E82 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,317 - src.main.operational_handler - INFO - tid=E88 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,318 - src.main.operational_handler - INFO - tid=E89 is not globally continuous in ex_id=1
2022-06-21 18:49:49,318 - src.main.operational_handler - INFO - tid=E92 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,318 - src.main.operational_handler - INFO - tid=E97 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,318 - src.main.operational_handler - INFO - tid=E109 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,318 - src.main.operational_handler - INFO - tid=E110 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078, 64792021079]
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - tid=E111 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021077]
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - tid=E113 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021078]
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - tid=E114 is not globally continuous (occurred less than three times); ex_id=1, sequence=[64792021079]
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - * List of periodic templates identified
2022-06-21 18:49:49,319 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:51,403 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:51,404 - src.main.operational_handler - INFO - List of templates whose dependency score is less then 1
2022-06-21 18:49:51,404 - src.main.operational_handler - INFO - tid=E89, mScore=0.1111, template="Registering class org.apache.hadoop.mapreduce.<*> "
2022-06-21 18:49:51,404 - src.main.operational_handler - INFO - tid=E31, mScore=0.3063, template="Default file system [hdfs://<*>]"
2022-06-21 18:49:51,404 - src.main.operational_handler - INFO - tid=E110, mScore=0.5000, template="Using callQueue class java.util.concurrent.LinkedB"
2022-06-21 18:49:51,405 - src.main.operational_handler - INFO - tid=E8, mScore=0.5000, template="adding path spec: /<*>/*"
2022-06-21 18:49:51,405 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:51,409 - src.main.operational_handler - INFO - bandwidth=0.063
2022-06-21 18:49:51,459 - src.main.operational_handler - INFO - cluster_labels = [3 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0]
2022-06-21 18:49:51,459 - src.main.operational_handler - INFO - * List of operational templates identified by the dependency analysis
2022-06-21 18:49:51,459 - src.main.operational_handler - INFO - tid=E89, template="Registering class org.apache.hadoop.mapreduce.<*> "
2022-06-21 18:49:51,459 - src.main.operational_handler - INFO - ----------------------------------------------------------------------------------------------------
2022-06-21 18:49:51,460 - src.main.operational_handler - INFO - Total number of log entries after removing periodic and operational templates 2963
2022-06-21 18:49:51,460 - src.main.operational_handler - INFO - Identified Operational tids: {'E89'}
2022-06-21 18:49:51,460 - src.main.operational_handler - INFO - Filtered messages rate: 0.1712
2022-06-21 18:49:51,460 - src.main.operational_handler - INFO - Remaining tids: ['E3', 'E4', 'E5', 'E7', 'E8', 'E9', 'E25', 'E27', 'E29', 'E31', 'E37', 'E42', 'E43', 'E47', 'E48', 'E49', 'E52', 'E53', 'E55', 'E61', 'E63', 'E64', 'E66', 'E67', 'E68', 'E69', 'E70', 'E71', 'E73', 'E75', 'E76', 'E82', 'E88', 'E92', 'E97', 'E109', 'E110', 'E111', 'E113', 'E114']
2022-06-21 18:49:51,460 - root - INFO - Exit LogCleaner without error(s)

About

License:Apache License 2.0


Languages

Language:Python 100.0%