logpai / Drain3

A robust streaming log template miner based on the Drain algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsed log messages after processing log files using drain3

boringbyte opened this issue · comments

How can I get the original log lines replaced with drain parsed log lines. Is that functionality still there? What is the reason behind removing text preprocessing step using regex?

Just call TemplateMiner.add_log_message() on each log line to get the template.
https://github.com/IBM/Drain3/blob/ac71a48dc08ee6eef08961aedd2e9db7ee09aa8a/drain3/template_miner.py#L114

The masking capability is still there, you should define the regex masks in the config object/file for TemplateMiner.

How can I get the masked content from each log?

result = template_miner.add_log_message('Invalid user test9 from 52.80.34.196')
result
{'change_type': 'none', 'cluster_id': 2, 'cluster_size': 14552, 'template_mined': 'Invalid user <*> from ', 'cluster_count': 49}

add_log_message only return template, how can I get <*>: test9 and : 52.80.34.196 ? Is there any builtin function for that?

How can I get the masked content from each log?

result = template_miner.add_log_message('Invalid user test9 from 52.80.34.196')
result
{'change_type': 'none', 'cluster_id': 2, 'cluster_size': 14552, 'template_mined': 'Invalid user <*> from ', 'cluster_count': 49}

add_log_message only return template, how can I get <*>: test9 and : 52.80.34.196 ? Is there any builtin function for that?

I modified the get_parameter_list in logpai/logparser, and it works..

extra_delimiters = ["_"] # In drain3.ini
def get_parameter_list(template, content):
    for deli in extra_delimiters:
        content = re.sub(deli,' ',content)
    template_regex = re.sub(r"<.{1,}?>", "<*>", template)
    if "<*>" not in template_regex: return []
    template_regex = re.sub(r'([^A-Za-z0-9])', r'\\\1', template_regex)
    template_regex = re.sub(r'\\ +', r'\\s+', template_regex)
    template_regex = "^" + template_regex.replace("\<\*\>", "(.*?)") + "$"
    parameter_list = re.findall(template_regex, content)
    parameter_list = parameter_list[0] if parameter_list else ()
    parameter_list = list(parameter_list) if isinstance(parameter_list, tuple) else [parameter_list]
    return parameter_list

@cwyalpha thanks for the code snippet. I will integrate this function soon.
One problem is when the log message contains <xxx> but that's not a mask.
I am going to add a configurable mask prefix and suffix instead of using <> always. This can somehow help to deal with such a case, by overriding the mask wrapper with something more unique.