varun-jalandery / Grab-AI-Challenge-Safety-Daryl-Ang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

  • The aim of this challenge is to identify dangerous trips by bookingID, based on various telematics data.
  • My approach to this challenge is to use a Supervised Learning Machine learning algorithm.
  • The algorithm will utilize Euclidean distance to compare with training data to determine the nature of each trip

Methodology

  1. Read Training Data into a dataframe and read all telematics data associated with it.
  2. Store each trip as a "trip" object in a dictionary, with key being bookingID and value being the object.
  3. For each trip in the raw data, compare against all trips in the training data and retrieve the trip that has the highest similarity based on Euclidean Distance (ie. which point in the training data is most similar to the trip in the raw data)
    • Each trip's telematic data is given an average value
    • Euclidean Distance is calculated based on the following formula: √( (x1-x2)^2 + (y1-y2)^2 + ... ), where property 1 and property 2 refer to each property of the telematric data.
  4. If the highest similarity is above a threshold, it is deemed to be similar to a dangerous trip and thus has a high probability of being dangersous as well

How to use script

  1. Scipt should be placed in the same directory as the training data and raw data.
  2. Training Data should be placed in a "Training File" folder and raw data placed in a "Raw Data" folder.
  3. All cells in the Script Jupyter Notebook should be run
  4. Final Output will be placed in the same directory

Conclusion

  • Given more training sets, the model is better able to find other trips which are dangerous as well
  • More research could be conducted to find a more suitable threshold

About


Languages

Language:Jupyter Notebook 100.0%