logpai / Drain3

A robust streaming log template miner based on the Drain algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions about drain_bigfile_demo

yanivweiss opened this issue · comments

Hi David,
Thanks for the great work on updating Drain to Python 3.

I have some questions about drain_bigfile_demo:
1- Why use partition by ':' ? => line = line.partition(": ")[2]
Not all log records have ':' and therefore I receive a very large amount of blank template.

2- I'm getting many templates that begin with masking,
since drain layers are divided to clusters by size and then words from the begining,
this is causing them to be in the same cluster.
Do you have a suggestion to solve this issue?

3- Writing results to file: for anomaly detection I'm required to create a new file with records containing timestamp and the result template from drain. How can I write this into a file? (How Do I map old log message record to result template)

Thanks!

Hi Yaniv

  1. Using : is specific to this use-case. In many text-based logs, the beginning of the log is strictly structured and can be parsed easily using a regex or other simple string processing means. This usually improves Drain's performance on the unstructured part.
    In this case, it converts:
Dec 10 06:55:46 LabSZ sshd[24200]: reverse mapping checking getaddrinfo for ns.marryaldkfaczcz.com [173.234.31.186] failed - POSSIBLE BREAK-IN ATTEMPT!

to:

reverse mapping checking getaddrinfo for ns.marryaldkfaczcz.com [173.234.31.186] failed - POSSIBLE BREAK-IN ATTEMPT!
  1. Can you provide few examples?
  1. Not sure I understood your question. If you process logs one-by-one, you have the result for each input log and can output those together. Or can you provide an example here too?