docwza / sumolights

SUMO adaptive traffic signal control - DQN, DDPG, Webster's, Max-pressure, Self-Organizing Traffic Lights

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ddpg questions

zzchuman opened this issue · comments

Hello, Wade Genders phd, I have read your paper and run you code. And I have a question. Can you answer it?

I have learned the ddpg algorithm suits the continuous action space. But in the traffic signal control field, the action was define as (0, 1, 2 ,3 means phase selection) or (0, 1 means change phase or keep it).

In your paper, the proposed action space for the adaptive traffic signal controller is the duration of the next green phase in seconds. I guess you define the actor network outputs a decimal and then round it. Right?

So, could teach me the ddpg algorithm actions setting?

Hello zzchuman,

You are correct, the DDPG implemented in this repository outputs a real number in the range [-1.0, 1.0] from the final tanh output neuron.

Example

For example, assume the maximum green phase duration is 25 seconds and the minimum green phase duration is 5 seconds and the DDPG's output is 0.25.

The next green phase duration is computed with the following equation:

next green phase duration = ( (0.25*(25-5)) + ((25+5)/2) )

Essentially, the DDPG's action is the deviation from the middle between the maximum and minimum green duration.

The function that performs this computation can be found here.