Traffic Flow Prediction with Vehicle Trajectories

This repository is a PyTorch implementation of the model TrGNN in the paper Traffic Flow Prediction with Vehicle Trajectories accepted at AAAI 2021 (pending release).

Trajectory transition

TrGNN model architecture

Requirements

PyTorch=0.4.1
Python=3.7
numpy=1.16.5
scipy=1.3.1
pandas=0.23.4
geopy=1.20.0
networkx=2.1
statsmodels=0.9.0 (optional; for VAR only)
scikit-learn=0.21.3 (optional; for RF only)
folium=0.10.0 (optional; for visualization only)

Pipeline

RawTaxiData --- map matching ---> ParsedTaxiData
ParsedTaxiData --- trajectory.py ---> recovered_trajectory_df
recovered_trajectory_df --- trajectory_transition.py ---> trajectory_transition
recovered_trajectory_df --- flow.py ---> flow
road_list & road_graph --- train_model.py ---> road_adj
trajectory_transition & flow & road_adj --- train_model.py ---> TrGNN

Dataset description

1. Road Network

The list of road segments are indexed in file data/road_list.csv:

road_id
103103595
103103090
...

The road graph is constructed with NetworkX and saved in GML format in data/road_graph.gml. Each node represents a road segment (label: road_id, length: in km), and each directed edge represents the adjacency between to road segments (weight: exponential decay of distance).

2. Trajectories

Trajectories after map matching (refer to Hidden Markov Map Matching) are saved at data/ParsedTaxiData_YYYYMMDD.csv:

vehicle_id	time	matched_road_id
EEEEEEE	14/03/2016 00:00:01	103064895
AAAAAAA	14/03/2016 00:00:10	103105500
...	...	...

Note: Files in the data folder contain dummy data for demo purpose. Real data have not been published due to confidentiality.

Prepare input

1. Trajectory cleansing

python trajectory.py -d 20160314 >> log/trajectory0314.log

Results are saved at data/recovered_trajectory_df_20160314_20160314.csv:

vehicle_id	trajectory_id	time	road_id	scenario
EEEEEEE	0	14/03/2016 00:00:01	103064895	0.1
EEEEEEE	0	14/03/2016 00:00:13	103064811	3.1
...	...	...	...	...

Similarly for other dates.

Note: The scenario column is for reference only (as documented in trajectory.py) and can be ignored.

2. Flow aggregation

python flow.py -d 20160314 -i 15 >> log/flow0314.log

Flows are aggregated in 15-minute intervals, and are saved at data/flow_20160314_20160314.csv:

	road_id_0	road_id_1	...
14/03/2016 00:00:00	33	67	...
14/03/2016 00:15:00	21	89	...
...	...	...	...

Similarly for other dates.

Baseline approaches (optional)

Run the following commands for baseline approaches.

# Historical Average. Modify `start_date` and `end_date` in code. 
# Note: The dataset should cover more than 14 days.
python baseline.py -m HA

# Moving Average. Run on demo dataset for demo purpose.
python baseline.py -m MA -D demo

# Vector Auto-Regression. 5-hop neighborhood. Run on demo dataset for demo purpose.
# statsmodels package is required.
python baseline.py -m VAR -H 5 -D demo

# Random Forest. 5-hop neighborhood. 100 trees. Run on demo dataset for demo purpose.
# Note: It takes longer to run RF as it trains one model for each road segment separately.
# scikit-learn package is required.
python baseline.py -m RF -H 5 -n 100 -D demo

The test results of the baseline approaches above are saved at result/MODEL_Y_true.pkl (ground truth results), and result/MODEL_Y_pred.pkl (predicted results).

For Diffusion Convolutional Recurrent Neural Network, refer to its PyTorch implementation.

TrGNN

1. Trajectory transition

(Optional) Run the following command for one single date. Similarly for other dates.

python trajectory_transition.py -d1 20160314 -d2 20160314 >> log/transition0314.log

The result is a tensor of shape 96 (# 15-minute intervals of day), 2404 (# road segments), 2404 (# road segments) and is saved at data/trajectory_transition_20160314_20160314.pkl.

Run the following command for the training period.

python trajectory_transition.py -d1 START_DATE -d2 END_DATE >> log/transition0314.log

2. Train and test TrGNN

# Train and test TrGNN. Run with GPU. Run on demo dataset for demo purpose.
python train_model.py -m TrGNN -D demo
# Train and test TrGNN-. Run with GPU. Run on demo dataset for demo purpose.
python train_model.py -m TrGNN- -D demo

Trained models are saved at model/[MODEL]_[TIMESTAMP]_[EPOCH]epoch.cpt whenever the validation MAE breaks through.

The test results are saved at are saved at result/[MODEL]_[TIMESTAMP]_Y_true.pkl (ground truth results), and result/[MODEL]_[TIMESTAMP]_[EPOCH]epoch_Y_pred.pkl (predicted results). Results are of shape # test intervals, # road segments.

3. Experimental result

We run this repository on SG-TAXI dataset (not released) and evaluation results are summarized in the paper (pending release).

4. Visualization (Optional)

Refer to the second half (commented out) in utils.py for displaying road segments, road network, and vehicle trajectories. folium package is required.

Citation

(pending)

ninahug / TrGNN