Delete cluster from drain dict id_to_cluster | Impact | procedure

Question

Delete cluster from drain dict id_to_cluster | Impact | procedure

stanchion-bishoyi opened this issue 3 years ago · comments

I observed that when I create a lot of clusters (10000+), the drain3 kernel consumes more processing time. So the solution I thought was to delete old clusters which are no use manually.
Assuming I have a list of clusterId which I want to remove from the drain3 kernel, what is the safest possible procedure? Please give a detailed explanation (how to modify parse tree or only deleting from template_miner.drain.id_to_cluster dict is sufficient. If no, then what else to do ?)
If deleting is not a good idea, then how to improve the running time?

David Ohana · Answer 1 · Sun Aug 01 2021 16:26:57 GMT+0800 (China Standard Time)

An optimal number of Drain clusters should be below 3000 in my experience.
First I suggest that you check whether redundant clusters are not created due to a masking problem and if that's the case, improve your masking regexes.
After that, I suggest using the max_clusters configuration to automatically remove rarely used clusters.
https://github.com/IBM/Drain3#memory-efficiency