logpai / Drain3

A robust streaming log template miner based on the Drain algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Delete cluster from drain dict id_to_cluster | Impact | procedure

stanchion-bishoyi opened this issue · comments

I observed that when I create a lot of clusters (10000+), the drain3 kernel consumes more processing time. So the solution I thought was to delete old clusters which are no use manually.
Assuming I have a list of clusterId which I want to remove from the drain3 kernel, what is the safest possible procedure? Please give a detailed explanation (how to modify parse tree or only deleting from template_miner.drain.id_to_cluster dict is sufficient. If no, then what else to do ?)
If deleting is not a good idea, then how to improve the running time?

An optimal number of Drain clusters should be below 3000 in my experience.
First I suggest that you check whether redundant clusters are not created due to a masking problem and if that's the case, improve your masking regexes.
After that, I suggest using the max_clusters configuration to automatically remove rarely used clusters.
https://github.com/IBM/Drain3#memory-efficiency