Wittline / optimizing-public-transportation

Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Monitoring the status of public Transportation with Apache Kafka

We will build an Streaming event pipeline around Kafka and its ecosystem that allows us to simulate and display the status of train lines in real time, using public data from the Chicago Transit Authority.

Data Source

Public data from Chicago Transit Authority

Architecture

image

How to run the project with docker

  • Install Docker Desktop on Windows, it will install docker compose as well, docker compose will alow you to run multiple containers applications
  • Install git-bash for windows, once installed , open git bash and download this repository, this will download the docker-compose.yaml file, and other files needed.

Dependencies

  • Kafka
  • Zookeeper
  • Schema Registry
  • REST Proxy
  • Kafka Connect
  • KSQL
  • Kafka Connect UI
  • Kafka Topics UI
  • Schema Registry UI
  • Postgres

The docker-compose file does not run your code, to start docker-compose, navigate to the starter directory containing docker-compose.yaml and run the following commands using git bash:

$> cd starter
$> docker-compose up

Starting zookeeper          ... done
Starting kafka0             ... done
Starting schema-registry    ... done
Starting rest-proxy         ... done
Starting connect            ... done
Starting ksql               ... done
Starting connect-ui         ... done
Starting topics-ui          ... done
Starting schema-registry-ui ... done
Starting postgres           ... done

You will see a large amount of text print out in your terminal and continue to scroll. This is normal! This means your dependencies are up and running.

To check the status of your environment, you may run the following command at any time from a separate terminal instance:

$> docker-compose ps

            Name                          Command              State                     Ports
-----------------------------------------------------------------------------------------------------------------
starter_connect-ui_1           /run.sh                         Up      8000/tcp, 0.0.0.0:8084->8084/tcp
starter_connect_1              /etc/confluent/docker/run       Up      0.0.0.0:8083->8083/tcp, 9092/tcp
starter_kafka0_1               /etc/confluent/docker/run       Up      0.0.0.0:9092->9092/tcp
starter_ksql_1                 /etc/confluent/docker/run       Up      0.0.0.0:8088->8088/tcp
starter_postgres_1             docker-entrypoint.sh postgres   Up      0.0.0.0:5432->5432/tcp
starter_rest-proxy_1           /etc/confluent/docker/run       Up      0.0.0.0:8082->8082/tcp
starter_schema-registry-ui_1   /run.sh                         Up      8000/tcp, 0.0.0.0:8086->8086/tcp
starter_schema-registry_1      /etc/confluent/docker/run       Up      0.0.0.0:8081->8081/tcp
starter_topics-ui_1            /run.sh                         Up      8000/tcp, 0.0.0.0:8085->8085/tcp
starter_zookeeper_1            /etc/confluent/docker/run       Up      0.0.0.0:2181->2181/tcp, 2888/tcp, 3888/tcp

Connecting to Services in Docker Compose

Now that your project’s dependencies are running in Docker Compose, we’re ready to get our project up and running. Windows Users Only: You must first install librdkafka-dev in your WSL Linux.

Run the following command in your Ubuntu terminal:

sudo apt-get install librdkafka-dev -y

Stopping Docker Compose and Cleaning Up

When you are ready to stop Docker Compose you can run the following command:

$> docker-compose stop
Stopping starter_postgres_1           ... done
Stopping starter_schema-registry-ui_1 ... done
Stopping starter_topics-ui_1          ... done
Stopping starter_connect-ui_1         ... done
Stopping starter_ksql_1               ... done
Stopping starter_connect_1            ... done
Stopping starter_rest-proxy_1         ... done
Stopping starter_schema-registry_1    ... done
Stopping starter_kafka0_1             ... done
Stopping starter_zookeeper_1          ... done

If you would like to clean up the containers to reclaim disk space, as well as the volumes containing your data:

$> docker-compose rm -v
Going to remove starter_postgres_1, starter_schema-registry-ui_1, starter_topics-ui_1, starter_connect-ui_1, starter_ksql_1, starter_connect_1, starter_rest-proxy_1, starter_schema-registry_1, starter_kafka0_1, starter_zookeeper_1
Are you sure? [yN] y
Removing starter_postgres_1           ... done
Removing starter_schema-registry-ui_1 ... done
Removing starter_topics-ui_1          ... done
Removing starter_connect-ui_1         ... done
Removing starter_ksql_1               ... done
Removing starter_connect_1            ... done
Removing starter_rest-proxy_1         ... done
Removing starter_schema-registry_1    ... done
Removing starter_kafka0_1             ... done
Removing starter_zookeeper_1          ... done

Running the producer

cd producers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python simulation.py

Running the Faust Stream Processing Application

cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
faust -A faust_stream worker -l info

Running the KSQL Creation Script

cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python ksql.py

Running the consumer

cd consumers
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python server.py

About

Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.


Languages

Language:Python 94.3%Language:HTML 5.7%