Weather & Metro Analysis
With KMA weather dataset & Seoul metro dataset
Visualization
Sinchon station
Jamsil station
Gangnam station
Displaying a very fixed pattern. Not that much variation.
Variables
- 17:00~23:00 number of passenger on/off board
- Mean temperature, humidity, rainfall at night
Target
- Usage of Sinchon station's last subway
Model
- Gradient boost
- Random forest
Result
- RMSE: 351
Feature Importance
As you can see, the incoming population between 7 P.M. and 8 P.M. has the highest importance when estimating the last train usage.
Clustering by boarding pattern
by "Getting on" pattern
by "Getting off" pattern
Discussion
- Can discriminate subway stations that are located in hot places, just by "Getting on/off" pattern
- Classification might be possible: hot / not-hot
- Further application need to be made to 5678 subway lines
- Further research might be fun: the gray area btw hot & not-hot areas