traffic-control_RL (Discrete Action Space)
Reinforcement Learning based traffic-control
Prerequisite
- python 3.7.9 above
- pytorch 1.7.1 above
- tensorboard 2.0.0 above
How to use
check the condition state (throughput)
python ./Experiment/run.py simulate
Run in RL algorithm DQN (default device: cpu)
python ./Experiment/run.py train --gpu False
If you want to use other algorithm, use this code (ppo,super_dqn, REINFORCE, a2c)
python ./Experiment/run.py train --algorithm ppo
Check the RL performance that based on FRAP model [FRAP Paper]https://arxiv.org/abs/1905.04722
python ./Experiment/run.py train --model frap
Didn't check that it learns well. (Prototype)
- check the result Tensorboard
tensorboard --logdir ./Experiment/training_data
Hyperparameter in json, model is in ./Experiment/training_data/[time you run]/model
directory.
- replay the model
python ./Experiment/run.py test --replay_name /replay_data in training_data dir/ --replay_epoch NUM
New version of Learning Process(Discrete)
NxN intersecion
Single Agent DQN
-
Experiment
- Every 160s(depend on COMMON_PERIOD)
- Controls the each phase length that phases are in intersection system
-
Agent
- Traffic Light Systems (Intersection)
-
State
- Vehicle Movement Demand(in FRAP only) or Queue Length(2 spaces per each inEdge, total 8 spaces)for each end of phase: 4 phase==> 32spaces
-> each number of vehicle is divided by max number of vehicles in an edge.(Normalize) - Phase Length(If the number of phase is 4, spaces is composed of 4)
-> (up,right,left,down) is divided by max period (Normalize) - Searching method (1) Before phase ends, receive the inflow vehicles
- Vehicle Movement Demand(in FRAP only) or Queue Length(2 spaces per each inEdge, total 8 spaces)for each end of phase: 4 phase==> 32spaces
-
Action (per each COMMON_PERIOD of intersection)
- Tuple of +,- of each phases (13)
- Length of phase
-
Reward
- Max Pressure Control Theory
- Penalty if phase exceeds its max length
Decentralized DQN
-
Experiment
- Every 160s(depend on COMMON_PERIOD)
- Controls the each phase length that phases are in intersection system
-
Agents
- Traffic Light Systems (Intersection)
- Have their own offset value
- Update itself asynchronously (according to offset value and COMMON_PERIOD value)
-
State
- Queue Length(2 spaces per each inEdge, total 8 spaces)
-> each number of vehicle is divided by max number of vehicles in an edge.(Normalize, TODO) - Phase Length(If the number of phase is 4, spaces are composed of 4)
-> (up,right,left,down) is divided by max period (Normalize) - Searching method (1) Before phase ends, receive all the number of inflow vehicles
- Queue Length(2 spaces per each inEdge, total 8 spaces)
-
Action (per each COMMON_PERIOD of intersection)
- Tuple of +,- of each phases (13)
- Length of phase time changes -> minimum value exists and maximum value exists
-
Next State
- For agent, next state will be given after 160s.
- For environment, next state will be updated every 1s.
-
Reward
- Max Pressure Control Theory (Reward = -pressure=-(inflow-outflow))
New version of Learning Process(Continuous)
NxN intersecion Experiment
directory.
Decentralized DDPG
-
Experiment
- Every 160s(COMMON_PERIOD)
- Controls the phase length
-
Agent
- Traffic Light System (Intersection)
-
State
- Vehicle Movement Demand(in FRAP only) or Queue Length(2 spaces per each inEdge, total 8 spaces)
-> each number of vehicle is divided by max number of vehicles in an edge.(Normalize) - Phase Length(If the number of phase is 4, spaces are composed of 4)
-> (up,right,left,down) is divided by max period (Normalize)
- Vehicle Movement Demand(in FRAP only) or Queue Length(2 spaces per each inEdge, total 8 spaces)
-
Action (per each COMMON_PERIOD of intersection)
- Demand of each phase (in here 4 phase) -> multi-agent
- Between two phases, have 3 seconds for phase of all yellow movement signals.
-
Reward
- Max Pressure Control Theory
- Penalty if phase exceeds its max length
Old version of Learning Process
3x3 intersection with singleagent and multiagent system Discrete/
directory.
How to use
check the condition state (throughput)
python ./Discrete/run.py simulate
Run in RL algorithm DQN (default device: cpu)
python ./Discrete/run.py train --gpu False
If you want to use other algorithm, use this code (ppo, REINFORCE, a2c)
python ./Discrete/run.py train --algorithm ppo --gpu False
Check the RL performance that based on FRAP model [FRAP Paper]https://arxiv.org/abs/1905.04722
python ./Discrete/run.py train --model frap
Didn't check that it learns well. (Prototype)
- check the result Tensorboard
tensorboard --logdir ./Discrete/training_data
Hyperparameter in json, model is in training_data/model
directory.
Learning Process
- Agent Traffic Light System (Intersection)
- State
Vehicle Movement Demand(2 spaces per each inEdge, total 8 spaces)
Phase (4 or 8 spaces) choose by--phase [4 or 8]
Total 12 or 16 state spaces
Only in FRAP model -> 16 state spaces
- Action
Phase (4 or 8 spaces) each 20s
After action, all yellow light turn on for 5s
Utils
gen_tllogic.py
python /path/to/repo/util/gen_tllogic.py --file [xml]
graphcheck.py
python /path/to/repo/util/gen_tllogic.py file_a file_b --type [edge or lane] --data speed
- check the tensorboard
`tensorboard --logdir tensorboard`