Hackatosh / log-parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log Parser

Welcome to the Log Parser repository !

The Log Parser aims to parse HTTP access log files and display statistics and alerts related to their contents.

Getting started

Prerequisites

Before being able to run this project, you need to have :

Those will allow you to install the dependencies needed for this project and run it.

Installing

To install the project dependencies, you need to use yarn :

yarn install --frozen-lockfile

Running the project

You can directly run the project with ts-node using the following command :

yarn start:dev

Or, you can build the project using Typescript and then run the compiled version with the following commands :

yarn build

yarn start:compiled

Options can be passed to the software as follow :

yarn start:dev --alertRpsThreshold <number> --logFilePath <absolute_path_to_log_file>
  • alertRpsThreshold indicates the number of requests per second needed to trigger an alert (defaults to 10)
  • logFilePath indicates the absolute path to the log file (if not provided, the sample_csv.txt file will be read)

Demonstration

I recommend testing the software with the following parameters :

yarn start:dev --alertRpsThreshold 1

yarn start:dev --logFilePath resources/trigger_alert.txt

The former shows lots of statistics (and one alert). The latter shows the alert being fired, resolved and fired again.

Dev commands

To type check the code, you need to build the project with this command :

yarn build

You can run the unit tests using Jest with this command :

yarn test

You can lint the project with ES Lint using this command :

yarn lint

You can fix automatically linting errors using this command :

yarn lint:fix

The project

Features

The software is able to :

  • Read and parse a csv file which contains access logs, without any file size restriction.
  • Display an alert message when the number of requests exceed the configured request per second threshold on average for 2 minutes. Alert is not fired again until it is resolved, which will happen when the number of requests goes below the previous limit.
  • Display requests statistics (number of hits per section, per request, per status and total) every 10s.

A basic working CI pipeline has been configured for the repository and automatically build, unit test and lint the codebase.

Built with

The whole project is written in Typescript. The package management is done with Yarn.

Run-time

  • NodeJS - Server-side JavaScript runtime environment

Packages used

  • split2 - Break up a file stream and reassemble it so that each line is a chunk
  • minimist - Parse arguments
  • moment - Date library for parsing, validating, manipulating, and formatting dates

Tooling

  • Jest - Javascript testing framework
  • ESLint - JavaScript linter

CI

Architectures choices

The code is divided into 6 independent modules :

  • The line-by-line file reader which uses native fs module and split2 module
  • The CSV parser which converts each parsed line into a JS object
  • The Statistics Logic aggregating the csv lines into one stats report object and determining when this object should be sent to the next pipeline steps
  • The Alerts Logic that manages the high traffic alert state (no alert / fired / resolved)
  • The Alerts Display handling alert related messages (fired/resolved)
  • The Statistics Display handling statistics reports

Decomposing the code this way has multiple advantages :

  • No coupling between statistics and alerting at all
  • Display and logics are clearly separated so you can easily change what you do with the alerts and the reports (as an example, you could send an email instead of displaying the alerts in the console)
  • You can easily change the way the logs are obtained (as example, instead of reading of file, you could read from standard input)
  • It is very easy to unit test each part

In order to be scalable, the software is entirely based on stream. Stream allows to process data chunk by chunk (instead of loading everything into the RAM, like you would have to by using Buffer), which allows to treat a number of logs as big as needed.

The CSV Parser, Statistics Logic and Alerts Logic are implemented as Transform Stream and Stats Display and Alerts display are implemented as Writable Stream.

The only limitation to scalability here is the internal state managed by the Stats Logic and the Alert Logic : if you have a huge number of requests in a short interval of time, some structure could overflow.

Possible improvements

Optimisations :

  • Optimise the methods provided by the TimestampArray object (an array that contains number which are always sorted) used by the Alerting logic (for example with the use of binary search)
  • Implement automated "E2E" tests which will read sample csv files and check the outputs of the software

Features :

  • Implement more interesting ways of displaying Statistics Report and Alerts (Slack message, emails...)
  • Implement a more sophisticated Alerting Logic (alerts by section, by request status, etc) with different thresholds
  • Add more statistics
  • Add a configuration to read logs from standard input

Author

This project was made by Edouard Benauw

About


Languages

Language:TypeScript 89.0%Language:JavaScript 11.0%