suraj-deshmukh / loghub

A collection of system log datasets for massive log analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loghub

Loghub maintains a collection of system logs, which are freely accessible for research purposes. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. All these logs amount to over 87GB in size. We thus host only a small sample (2k lines) on Github for each dataset.

How to get the data?

If you are interested in these datasets, please request the raw logs at Zenodo. Kindly note that missing of the required information may result in ignorance of your request without any further notification.

Note: The loghub datasets are currently in beta release!

Logs currently available:

Software System Time Span #Messages Size Compressed (.tar.gz) Source Link
Distributed systems
HDFS 38.7 hours 11,175,629 1.54GB 152.01MB
N.A. 71,118,073 16.84GB 877.38MB
Hadoop N.A. 394,308 49.78MB 2.50MB
Spark N.A. 33,236,604 2.88GB 179.18MB
Zookeeper 26.7 days 74,380 10.18MB 452KB
OpenStack N.A. 207,820 60.02MB 5.27MB Link
Operating systems
Windows 226.7 days 114,608,388 27.36GB 1.63GB
Linux 263.9 days 25,567 2.30MB 228KB
Mac 7.0 days 117,283 16.48MB 1.46MB
Server applications
Apache Web server 263.9 days 56,481 5.02MB 260KB
OpenSSH 28.4 days 655,146 71.70MB 4.49MB
Mobile systems
Andriod N.A. 63,042,037 7.00GB 825.57MB
HealthApp 10.5 days 253,395 22.98MB 2.24MB
Supercomputers
Blue Gene/L 214.7 days 4,747,963 725.77MB 61.46MB Link
HPC N.A. 433,489 32.77MB 3.21MB
Thunderbird 244 days 211,212,192 31.04GB 1.97GB
Standalone software
Proxifier N.A. 21,329 2.48MB 172KB

Publications using these datasets

Organizations that request these datasets

We proudly announce that the loghub datasets have been requested by more than 31 organizations.

Feedback

For any questions or feedback, please post to our issue page.

License

The log datasets are freely available ONLY for research purposes.

copyright

About

A collection of system log datasets for massive log analysis