hfinch1991 / loghub

A collection of system log datasets for massive log analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

loghub

Loghub maintains a collection of system logs, which are freely accessible for research purposes. Some of the logs are production data released from previous studies, some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. All these logs amount to XX GB in size. We host only a small sample (2k lines) on Github for each dataset, please contact us if you are interested in the raw logs.

Logs currently available:

Software System Dataset Name Time Span #Messages Raw Size Compressed Size (.tar.gz)
Big data systems
HDFS hdfs-1 38.7 hours 11,175,629 1.54GB 152.01MB
hdfs-2 xx xx xx xx
Hadoop hadoop xx xx 49.78MB 2.50MB
Spark spark-1 xx xx xx xx
spark-2 xx xx xx xx
Zookeeper zookeeper xx xx xx xx
Operating systems
Windows windows 226.7 days 114,608,388 27.36GB 1.63GB
Linux linux xx xx xx xx
Mac mac xx xx xx xx
Web applications
Apache apache xx xx xx xx
Mobile systems
Andriod (available soon) andriod xx xx xx xx
Supercomputers
BGL bgl 214.7 days 4,747,963 725.77MB 61.46MB
HPC hpc xx xx xx xx
Thunderbird thunderbird xx xx xx xx
On-premises software
Proxifier proxifier xx xx xx xx

Publications using these datasets

License

The log datasets are freely available ONLY for research purposes.

About

A collection of system log datasets for massive log analysis