guidefloripa / UserClassifier

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UserClassifier

Classify apache log by userid.

The logs are available in several servers that will be processed in a cluster. The system architecture is shown below:

cluster
   |-- server01
   |-- server02
   |-- ...

In order to simulate, first generate some random log files. For this, type the following command:

python log_generator.py

After generating the log files, now it's time to classify them using the following command:

python main.py

The results are stored in the folder users_sorted.

While running, the log files are grouped by userid and stored in the folder users_grouped. After grouping, each user file is sorted and stored in the folder users_sorted.


Some parameters may be changed in the file constants.py.

The implementation is using open (for using local files or mounted folders from a remote server), but it can be replaced by some protocol like http or ftp for downloading files.

About


Languages

Language:Python 100.0%