logpai / Drain3

A robust streaming log template miner based on the Drain algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Saving log template/cluster and ID for each log

aredelmeier opened this issue · comments

Hi!

I am familiar with the old package and starting to get accustomed with Drain3.

I have a log file example.log and I have used Drain3 to parse each log with

logging.basicConfig(filename="output_example.log", filemode='a', level=logging.DEBUG)
logger = logging.getLogger(__name__)

config = TemplateMinerConfig()
config.load("drain3.ini")
config.profiling_enabled = True
template_miner = TemplateMiner(config=config)

line_count = 0
with open("example.log") as f:
    lines = f.readlines()

batch_size = 10

for line in lines:
    line = line.rstrip()
    line = line.partition(": ")[2]
    result = template_miner.add_log_message(line)
    line_count += 1
    if line_count % batch_size == 0:
        logger.info(f"Processing line: {line_count}, rate {rate:.1f} lines/sec, "
                    f"{len(template_miner.drain.clusters)} clusters so far.")
        
    if result["change_type"] != "none":
        result_json = json.dumps(result)
        logger.info(f"Input ({line_count}): " + line)
        logger.info("Result: " + result_json)

sorted_clusters = sorted(template_miner.drain.clusters, key=lambda it: it.size, reverse=True)

for cluster in sorted_clusters:
    logger.info(cluster)

I am able to load the sorted clusters/templates by specifying

with open('output_example.log', 'r') as f:
  lines = f.readlines()

But it is a bit tedious to keep track of the different log clusters/templates this way and I have not found a way to label each original log with its new log cluster/template ID.

Do you have any suggestions of how to do this in a better way? For example, how to save a CSV with columns "original log row number ", "new parsed log", "parsed log ID"?

Thanks in advance for your help!

Annabelle

Hi, you can construct the CSV columns yourself on the fly after parsing each raw log file. Create a list of dicts initially, append a dict after each parsed lines, with the info you want (e.g. raw log line index, parsed template from the result and cluster ID). When you finished processing all lines, use Pandas FromRecords() to convert the list of dicts into a data frame and then save the data frame into a CSV file.

Hi! Thanks for your answer. I didn't realize that all the information I was looking for was in the object

template_miner.add_log_message(line)

Thanks for your help!

Annabelle