Williamdst / The-Subway-Challenge

The objective of the project is to use graph theory to determine a set of potential paths that can be used to set a Guinness World Record in the Subway Challenge.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science = Solving Problems = Happiness

The Subway Challenge

Denzel S. Williams
Springboard Data Science Track '21

0. Inspiration

I'm sort of an efficiency junkie and I have always enjoyed logistics, optimization, and basically anything that involves how people/things move. I honestly have no clue how I came across the Subway Challenge, but I it is as a massive logistics problem and I COULDN'T RESIST tackling it.

Also I want to win a world record.

I. Introduction

To set a record in the Subway Challenge a participant must navigate the entire New York City Subway system (network) in the shortest time possible. The challenge requires competitors to stop at all 472 stations in the network and no person currently holds that record [1] . The most recent record of 21H:28M:14S was set on July 22, 2016 by Matthew Ahn for the 469-Station Challenge [2]. Aside from beginning at Far Rockaway-Mott Avenue and ending at Flushing-Main Street, the route and methodology he used to beat the record is unknown.

The goal of this project is to use graph theory to determine a set of paths that could potentially be used to beat the current record.

II. Understanding the Problem

To solve this problem a graph representation of the subway system needs to be constructed. The system can be modeled as a weighted undirected graph, where the weights on the edges are the time it takes to get from one station to the next. Since you can travel in both directions on each line, the direction is not needed (there is one station that is an exception). The actual map of the system needs to be translated into nodes and edges; to simplify this translation the Late-Night Subway Service map is used. In the late-night subway map all stations are served though not all lines run; most lines run local, making all stops. The late-night map is a starting point to attempt beating the challenge. The map cannot be used, as is, to beat the challenge because the map is only valid from 00:00 - 06:00 every day. The results of the late-night map will tell you what to do, but not how to do it.

Once 06:00 hits, all trains are activated, and express routes are implemented. For example, the late-night A-Train might go to certain stops, but it skips over them in the day. In the day, the A-Train is an express train and staying on it for the entire line wouldn’t take you to every stop. At some point you would have to get off and make a transfer to the local C-train to check off the stops that the A-Train skips. This is the main reason why the objective is to determine a set of paths and not just a single path. The only weight the program understands is the time between stations, it doesn't understand that train switching is expensive. Every time you get off a train you must wait for the next one to arrive, which adds to the overall time. Therefore, the program can only return a set of potential options that a human would then need to filter through and plan out.

III. Modeling the MTA Subway System

The bulk of the work is translating the map into nodes and edges, saving them as CSV files that the program can understand. Like any route-inspection style problem, the Subway Challenge is about decision making, specifically what are you going to do at junctions, stations where you can transfer to a different line, or in the graph theoretical sense, nodes with degree greater than 2. Of the 472 stations in the system there are only 79 junctions which I call "decision stations".

IV. Modifying a Prepackaged Solution

In 2017, Andrew Brooks was tackling a similar problem which he solved using the NetworkX 2.0 library [3]. Thankfully, he packaged his solution into the postman_problems package. With this package, you can plug in your own network and solve the Chinese Postman Problem (CPP). Unfortunately, the Subway Challenge isn't a typical CPP problem. The postman always wants to return to his vehicle, so the CPP finds a path that ends where it began. The Subway Challenge has no such requirement, the sole condition is to travel to all the edges at least once. Andrew's postman_package solves the CPP as is, therefore plugging in the subway network wouldn't work because it would always output a sub-optimal solution. However, with a little bit of network theory, the NetworkX 2.5 update, and some tweaks to his package, I was able to build on his work to solve the problem.

Installing My Cloned Package

pip install git+https://github.com/Williamdst/postman_problems.git

V. The Routes

Of the 79 stations, there were 58 odd-degree nodes resulting in 1653 start-end configurations. To store all of the configurations and their stats, a simple SQLite database was integrated in the program.

Figure 1. Entity Relationship Diagram of the Database

If you never had to double back and could teleport to whatever station you needed to, the time it would take to traverse each of the 104 edges EXACTLY one time would be 14.75 hours (884m). The rest of the time is spent going back over edges you already traveled; in Matthew Ahn's case that was nearly 7 hours. The columns that are used to pick a route are distance_walked and distance_doublebacked. The reason that edges_walked isn't a major concern is because it matters what edge you had to double back over. You can't make the claim that a route with 150 edges_walked is better than a 151-edge route, because that one edge may be the worst edge in the network.

The node that was in 8 of the 10 top routes, either as the start or the end station, was 416 Wakefield-241 St (The last stop of the 2 train). What's more interesting is that all the nodes paired with it were also extreme stations, meaning, they were at the end of a line. More than that, those extremes were aggressively extreme, not only were they at the end of a line, but they were also at the end of lines that had no transfer opportunities and took over 15m to reach. The route that Matthew took started and ended at two very aggressive extremes and the path that contained those two extremes took 21.06 hours (37th ranked route).

The "Best" Routes

Picking out the best route isn't as straight-forward as querying the database, finding the path with minimal distance, and following the directions. Remember, the program doesn't understand the cost of excessive transfers, that there are transfers that provide shortcuts, and that the network topology isn't static. The one major insight that can be used to filter out routes is that aggressively extreme stations are where you want to start and where you want to end, which leaves about only 10 choices (45 configurations). The steps for the best routes aren’t listed in this report because each route has over 145 steps, but there is a Describe-Route.sql file in the repository that contains the query to use to list out all the steps for any path. The properties of the most interesting paths are shown in the table below:

Start Station Stop Station Time (Hrs) Route Rank
Gold Route Wakefield-241 St (2-Train) Woodlawn (4-Train) 20.65 1
Silver Route Wakefield-241 St (2-Train) Norwood-205 St (D-Train) 20.66 2
Bronze Route Wakefield-241 St (2-Train) Pelham Bay Park (6-Train) 20.7 3
The Worst Route Sutphin Blvd-Archer Av-JFK Aiport (E-Train) Coney Island-Stillwell Av (D-Train) 22.35 1653
Matthew Ahn's Route Far Rockaway-Mott Av (A-Train) Flushing-Main St (7-Train) 21.06 37
The Route I May Implement Wakefield-241 St (2-Train) Far Rockaway-Mott Av (A-Train) 20.75 4

Acknowledgements

I want to give a shoutout to my mentor Devin Cavagnaro for teaching me about agile project management. Without the iterative approach of agile management the project would have been thrown in the trash.

Big thanks to Andrew Brooks for his work and the well put together postman_problems package, I couldn't have completed the project without it. And once again, Alexandre Sanlim for the repository of awesome badges

About

The objective of the project is to use graph theory to determine a set of potential paths that can be used to set a Guinness World Record in the Subway Challenge.

License:MIT License


Languages

Language:Jupyter Notebook 86.4%Language:Python 13.6%