Some questions about drain_bigfile_demo

Question

Some questions about drain_bigfile_demo

yanivweiss opened this issue 2 years ago · comments

Hi David,
Thanks for the great work on updating Drain to Python 3.

I have some questions about drain_bigfile_demo:
1- Why use partition by ':' ? => line = line.partition(": ")[2]
Not all log records have ':' and therefore I receive a very large amount of blank template.

2- I'm getting many templates that begin with masking,
since drain layers are divided to clusters by size and then words from the begining,
this is causing them to be in the same cluster.
Do you have a suggestion to solve this issue?

3- Writing results to file: for anomaly detection I'm required to create a new file with records containing timestamp and the result template from drain. How can I write this into a file? (How Do I map old log message record to result template)

Thanks!

David Ohana · Answer 1 · Sun Feb 20 2022 15:52:54 GMT+0800 (China Standard Time)

Hi Yaniv

Using : is specific to this use-case. In many text-based logs, the beginning of the log is strictly structured and can be parsed easily using a regex or other simple string processing means. This usually improves Drain's performance on the unstructured part.
In this case, it converts:

Dec 10 06:55:46 LabSZ sshd[24200]: reverse mapping checking getaddrinfo for ns.marryaldkfaczcz.com [173.234.31.186] failed - POSSIBLE BREAK-IN ATTEMPT!

to:

reverse mapping checking getaddrinfo for ns.marryaldkfaczcz.com [173.234.31.186] failed - POSSIBLE BREAK-IN ATTEMPT!

David Ohana · Answer 2 · Sun Feb 20 2022 15:54:01 GMT+0800 (China Standard Time)

Can you provide few examples?

David Ohana · Answer 3 · Sun Feb 20 2022 15:56:14 GMT+0800 (China Standard Time)

Not sure I understood your question. If you process logs one-by-one, you have the result for each input log and can output those together. Or can you provide an example here too?