Parsed log messages after processing log files using drain3
boringbyte opened this issue · comments
How can I get the original log lines replaced with drain parsed log lines. Is that functionality still there? What is the reason behind removing text preprocessing step using regex?
Just call TemplateMiner.add_log_message()
on each log line to get the template.
https://github.com/IBM/Drain3/blob/ac71a48dc08ee6eef08961aedd2e9db7ee09aa8a/drain3/template_miner.py#L114
The masking capability is still there, you should define the regex masks in the config object/file for TemplateMiner.
How can I get the masked content from each log?
result = template_miner.add_log_message('Invalid user test9 from 52.80.34.196')
result
{'change_type': 'none', 'cluster_id': 2, 'cluster_size': 14552, 'template_mined': 'Invalid user <*> from ', 'cluster_count': 49}
add_log_message only return template, how can I get <*>: test9 and : 52.80.34.196 ? Is there any builtin function for that?
How can I get the masked content from each log?
result = template_miner.add_log_message('Invalid user test9 from 52.80.34.196')
result
{'change_type': 'none', 'cluster_id': 2, 'cluster_size': 14552, 'template_mined': 'Invalid user <*> from ', 'cluster_count': 49}add_log_message only return template, how can I get <*>: test9 and : 52.80.34.196 ? Is there any builtin function for that?
I modified the get_parameter_list in logpai/logparser, and it works..
extra_delimiters = ["_"] # In drain3.ini
def get_parameter_list(template, content):
for deli in extra_delimiters:
content = re.sub(deli,' ',content)
template_regex = re.sub(r"<.{1,}?>", "<*>", template)
if "<*>" not in template_regex: return []
template_regex = re.sub(r'([^A-Za-z0-9])', r'\\\1', template_regex)
template_regex = re.sub(r'\\ +', r'\\s+', template_regex)
template_regex = "^" + template_regex.replace("\<\*\>", "(.*?)") + "$"
parameter_list = re.findall(template_regex, content)
parameter_list = parameter_list[0] if parameter_list else ()
parameter_list = list(parameter_list) if isinstance(parameter_list, tuple) else [parameter_list]
return parameter_list
@cwyalpha thanks for the code snippet. I will integrate this function soon.
One problem is when the log message contains <xxx>
but that's not a mask.
I am going to add a configurable mask prefix and suffix instead of using <> always. This can somehow help to deal with such a case, by overriding the mask wrapper with something more unique.