Real-time-Geospatial-analysis-of-transport-data

Overview | Requirements | User guide | Data preprocessing | Dashboard | Contribution

Overview

This project is a data processing pipeline that implements a complete end-to-end real-time geospatial analysis and visualization solution using Kibana. This multi-component solution was built as a demonstration of the capabilities what can be done with modern open-source technologies, especially Apache Spark, kafka and elasticsearch

Requirements

You should to install the following technologies to test the project

Spark (spark-3.3.0-bin-hadoop3)
Kafka (kafka_2.12-3.2.0)
ElasticSearch (elasticsearch-7.17.0)
Kibana (kibana-7.17.0)
Note that you need to get api Key for the data streaming following this link : https://api-v3.mbta.com/login

User guide

Make sure all the requirement with the listed versions are installed .

Start Kafka Zookeeper

bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka

bin/kafka-server-start.sh config/server.properties

Run the producer.py and add your api key
Start ElasticSearch

cd $PATH-to-bin-file
./bin/elasticSearch

Start Kibana

cd $PATH-to-bin-file
./bin/kibana

Create and index according to the following mapping :

PUT test

{"mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "event" : {"type": "text"},
      "type" : {"type": "text"},
      "bearing" : {"type": "text"},
      "current_status" : {"type": "text"},
       "current_stop_sequence" : {"type": "text"},
       "label" : {"type": "text"},
       "latitude" : {"type": "double"},
       "longitude" : {"type": "text"},
        "coordinates" : {"type": "geo_point"},
        "speed" : {"type": "text"},
        "update_at" : {"type": "date"},
        "route" : {"type": "text"},
        "stop" : {"type": "text"},
        "trip" : {"type": "text"},
        "total_distance":{"type":"float"}
      
    }
  }
}

Run the consumer_es.py

spark-submit  --master local --driver-memory 2g --executor-memory 1g --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0,org.elasticsearch:elasticsearch-spark-30_2.12:7.17.0 consumer_es.py

Run the calcul_distance.py

spark-submit  --master local --driver-memory 2g --executor-memory 1g --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 calcul_distance.py

Dashbord

Using the kibana dashboard we can visualise a map with the trajectory travelled by the means of transport

Contribution

Imen Azzouz
Toumi Mohamed Amine

cachatj / -Real-time-Geospatial-analysis-of-transport-data