logpai / Drain3

A robust streaming log template miner based on the Drain algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matching float numbers

aguinaldoabbj opened this issue · comments

Hi,

I'm trying to match and mask a float number ("+12", "-12", "-3.14", ".314e1", etc) in a sentence. I've tried several regexes, like this one: "^a-zA-Z:"

Althoug this regex works when I run in python re.findall("[^a-zA-Z:]([-+]?\d+[\.]?\d*)", 'Hi, -1.25 is a float') ,
if I add it to the mining instructions as

{"regex_pattern":"(?<![a-zA-Z:])[-+]?\d*\.?\d+", "mask_with": "FLO"},

the masking doesn't occur, I get a <:*:> in the mined template.

What am I doing wrong?

Drain will work well with floats even if you mask only integers because you will get a template like <NUM>.<NUM> for floats in this case.
If you still want a float template, please check that you escape \ properly as \\ in the .ini file.

@davidohana Thanks for your reply.

You mean that by using the "default" regexes in drain3.ini ?

By using those default regexes I can get a template with <NUM>.<NUM> just for a few cases.

For example, for this sentence:
Rate of Change Rate 0.572213 Limit 0.1
the drained template is Rate of Change Rate <:*:> Limit <:*:>

And for this sentence:
Deviation Alarm Actual 954.4 Target 3
the drained template is Deviation Alarm Actual <:NUM:>.<:NUM:> Target <:NUM:>

The regex for in the default .ini file is: "((?<=[^A-Za-z0-9])|^)([\\-\\+]?\\d+)((?=[^A-Za-z0-9])|$)". Indeed, it works for both use cases if I use the regex in re.findall() but not with drain.

That's why I'm trying a new regex in the .ini file . I followed your tip and scaped \ in the .ini file:
{"regex_pattern":"(?<![a-zA-Z:])[-+]?\\d*\\.?\\d+", "mask_with": "FLO"},

Still no luck. Both regexes worked flawlessly with re.findall() but not in drain.

I'm trying to figure out what am I doing wrong.

The order of masks also matters.
Please put float mask before int masks. If some param occurrences are mixed - some float and some int, it can also cause this issue so you may need to capture both in the same mask regex.

Ok, I've figured out a regex that matches the floats, so I replaced the default one. But I still don't get why the default one does not work as expected for all my sentences. After a visual inspection, I think the default regex for numbers doesn't match negative floats.

If masking step produced something like
this is <:NUM:>
this is <:FLO:>

Then Drain step will merge those two into
this is <:*:>

If masking step produced something like
this is <:NUM:>
this is <:FLO:>

Then Drain step will merge those two into
this is <:*:>

Yes, I got it. But I was referring to using the default .ini file, with the default regexes, without adding or removing any regex. So I believe my sentences are matched by more than one regex in the default .ini so they get merged in the drain step.