memgraph / data-streams

Publicly available real-time data sets on Kafka, Redpanda, RabbitMQ & Apache Pulsar

Home Page:https://github.com/g-despot/data-streams

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ“Š data-streams πŸ“Š

Publicly available real-time data sets on Kafka, Redpanda, RabbitMQ & Apache Pulsar

πŸ’¬ About

This project serves as a starting point for analyzing real-time streaming data. We have prepared a few cool datasets which can be streamed via Kafka, Redpanda, RabbitMQ, and Apache Pulsar. Right now, you can clone/fork the repo and start the service locally, but we will be adding publicly available clusters to which you can just connect.

πŸ“‚ Datasets

Currently available datasets:

⏩ How to start the streams?

Place yourself in root folder and run:

python3 start.py --platforms <PLATFORMS> --dataset <DATASET>

The argument <PLATFORMS> can be:

  • kafka,
  • redpanda,
  • rabbitmq and/or
  • pulsar.

The argument <DATASET> can be:

  • github ,
  • art-blocks ,
  • movielens or
  • amazon-books.

That script will start chosen streaming platforms in docker container, and you will see messages from chosen dataset being consumed.

You can then connect with Memgraph and stream the data into the database by running:

docker-compose up <DATASET>-memgraph

For example, if you choose Kafka as a streaming platform and art-blocks for your dataset, you should run:

python3 start.py --platforms kafka --dataset art-blocks

If you are a Windows user and the upper command doesn't work, try replacing python3 with python.

Next, in the new terminal window run:

docker-compose up art-blocks-memgraph

πŸ“œ References

There's no documentation yet, but it's coming soon! Throw us a star to keep up with upcoming changes.

About

Publicly available real-time data sets on Kafka, Redpanda, RabbitMQ & Apache Pulsar

https://github.com/g-despot/data-streams

License:MIT License


Languages

Language:Python 96.8%Language:Dockerfile 2.3%Language:Shell 0.9%