anushadatta / NYC-Taxi-Trip-Duration

đźš• Predicting NYC Taxi Trip Duration with machine learning.

Home Page:https://youtu.be/nVWGlsx3vGg

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NYC Taxi Trip Duration

New York City’s 12,779 yellow medallion taxicabs comprise a $1.8 billion industry serving about 240 million passengers a year. Information on New York’s cabs attracts a broad audience due to their central transportation role and their prominence in Manhattan traffic. Exploiting an understanding of taxi trip durations and the ability to predict taxi durations could present valuable insights to city planners and the people of New York. Hence, this problem statement is of great significance.

The Kaggle competition named “New York City Taxi Trip Duration” consists of the 2016 NYC Yellow Cab trip record data, which was originally published by the NYC Taxi and Limousine Commission (TLC). This competition demands us to build a model that predicts the total ride duration of taxi trips in New York City. Thus, the problem statement is defined as follows: determine best predictors of NYC taxi trip durations, and build a multivariate taxi trip duration predictor.

Model Performance

Final Model: XGBoost model with K-fold Cross Validation

Result (Kaggle Public Leaderboard): RMSE 0.37356, 79th position (top 6.3%)

Result (Kaggle Private Leaderboard): RMSE 0.37112, 116th position (top 9.2%)

Set Up

PREREQUISITES

  • Ensure Python3 and pip is downloaded and added to system environment variables.

  • The following packages are required: pandas, numpy, seaborn, datetime, matplotlib, xgboost, and sklearn. Please ensure they are downloaded through the command pip install <package_name>

INSTRUCTIONS TO RUN CODE

Team

  • Anusha Datta
  • Amrita Ravishankar
  • Atrik Das
  • Divyesh Mundhra
  • Mehul Kumar

About

đźš• Predicting NYC Taxi Trip Duration with machine learning.

https://youtu.be/nVWGlsx3vGg

License:MIT License


Languages

Language:Jupyter Notebook 100.0%