dhruvmanila / taxi_availability_eda

An exploratory data analysis on taxi availability in Singapore

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Taxi Availability EDA

An exploratory data analysis on taxi availability in Singapore

Dataset: Extracted from the API endpoint (https://api.data.gov.sg/v1/transport/taxi-availability) The below function was used to collect the data and store in a CSV file taxi_availability.csv.

The Notebook is too large to be uploaded on GitHub, it is available on Google Colab: https://colab.research.google.com/drive/1nw-RLUhurcgXEY73Vq4sXpHnyb6qj1uJ?usp=sharing

import csv
import datetime

import requests

# https://data.gov.sg/dataset/taxi-availability
API_URL = "https://api.data.gov.sg/v1/transport/taxi-availability"


def collect_data():
    # The data we receive is on the nearest datetime from the requested datetime. If we
    # start from minute 0, we might get the data from the previous day resulting in the
    # data started from hour 23 to hour 23 the next day. This could be problematic when
    # either sorting/showing the data.
    #
    # The workaround would be to start roughly a minute later for the day we would like
    # to collect the data from.
    start = datetime.datetime(year=2021, month=8, day=1, hour=0, minute=1, second=0)
    end = datetime.datetime(year=2021, month=8, day=1, hour=23, minute=59, second=59)
    delta = datetime.timedelta(minutes=10)

    with open("taxi_availability.csv", "w") as csvfile:
        fieldnames = ["timestamp", "coordinate", "longitude", "latitude"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        while start < end:
            print(f"Collecting data on {start}...")
            response = requests.get(API_URL, params={"date_time": start.isoformat()})
            data = response.json()
            features = data["features"][0]
            timestamp = features["properties"]["timestamp"]
            for coordinate in features["geometry"]["coordinates"]:
                writer.writerow(
                    {
                        "timestamp": timestamp,
                        "coordinate": coordinate,
                        "longitude": coordinate[0],
                        "latitude": coordinate[1],
                    }
                )
            start += delta

Number of available taxis per hour

available taxis per hour histogram

  • The number of taxis available are less in the morning and increases in the afternoon and then decreases back again during the midnight.
  • As the number of people commuting increases, the supply for the taxis increases with it.

Time series map of available taxis by the hour

static time series map of available taxis

The above is a static image of the animation showcasing the available taxis at every hour on a given day. To look at the animation, please refer to the Google Colab notebook.

Number of available taxis by the region

taxis available in regions

  • The number of taxis available is more in the following areas:
    • Downtown
    • Airport
    • Tourist spots such as Museum, Zoo, Nature park, etc.
  • There are various hotspots all around the city where the availability is more suggesting a common spot for taxis.

Number of available taxis per hour by the region

Number of available taxis by the region for hour 0

taxis available in regions for hour 0

Number of available taxis by the region for hour 1

taxis available in regions for hour 1

This is a more granular analysis for number of taxis available by region per hour. For images for other time, please refer to the Google Colab notebook.

Routes taken by the available taxis on an average day

most routes taken by available taxis

  • This showcases all the busy roads taken by the taxis during an average day.
  • Most of the roads are all the main ones going through the important parts of the city namely the Downtown, Airport, University, Parks, Golf course etc.

About

An exploratory data analysis on taxi availability in Singapore

License:MIT License