Saving log template/cluster and ID for each log
aredelmeier opened this issue · comments
Hi!
I am familiar with the old package and starting to get accustomed with Drain3.
I have a log file example.log and I have used Drain3 to parse each log with
logging.basicConfig(filename="output_example.log", filemode='a', level=logging.DEBUG)
logger = logging.getLogger(__name__)
config = TemplateMinerConfig()
config.load("drain3.ini")
config.profiling_enabled = True
template_miner = TemplateMiner(config=config)
line_count = 0
with open("example.log") as f:
lines = f.readlines()
batch_size = 10
for line in lines:
line = line.rstrip()
line = line.partition(": ")[2]
result = template_miner.add_log_message(line)
line_count += 1
if line_count % batch_size == 0:
logger.info(f"Processing line: {line_count}, rate {rate:.1f} lines/sec, "
f"{len(template_miner.drain.clusters)} clusters so far.")
if result["change_type"] != "none":
result_json = json.dumps(result)
logger.info(f"Input ({line_count}): " + line)
logger.info("Result: " + result_json)
sorted_clusters = sorted(template_miner.drain.clusters, key=lambda it: it.size, reverse=True)
for cluster in sorted_clusters:
logger.info(cluster)
I am able to load the sorted clusters/templates by specifying
with open('output_example.log', 'r') as f:
lines = f.readlines()
But it is a bit tedious to keep track of the different log clusters/templates this way and I have not found a way to label each original log with its new log cluster/template ID.
Do you have any suggestions of how to do this in a better way? For example, how to save a CSV with columns "original log row number ", "new parsed log", "parsed log ID"?
Thanks in advance for your help!
Annabelle
Hi, you can construct the CSV columns yourself on the fly after parsing each raw log file. Create a list of dicts initially, append a dict after each parsed lines, with the info you want (e.g. raw log line index, parsed template from the result and cluster ID). When you finished processing all lines, use Pandas FromRecords() to convert the list of dicts into a data frame and then save the data frame into a CSV file.
Hi! Thanks for your answer. I didn't realize that all the information I was looking for was in the object
template_miner.add_log_message(line)
Thanks for your help!
Annabelle