ddpg questions
zzchuman opened this issue · comments
Hello, Wade Genders phd, I have read your paper and run you code. And I have a question. Can you answer it?
I have learned the ddpg algorithm suits the continuous action space. But in the traffic signal control field, the action was define as (0, 1, 2 ,3 means phase selection) or (0, 1 means change phase or keep it).
In your paper, the proposed action space for the adaptive traffic signal controller is the duration of the next green phase in seconds. I guess you define the actor network outputs a decimal and then round it. Right?
So, could teach me the ddpg algorithm actions setting?
Hello zzchuman,
You are correct, the DDPG implemented in this repository outputs a real number in the range [-1.0, 1.0] from the final tanh output neuron.
Example
For example, assume the maximum green phase duration is 25 seconds and the minimum green phase duration is 5 seconds and the DDPG's output is 0.25.
The next green phase duration is computed with the following equation:
next green phase duration = ( (0.25*(25-5)) + ((25+5)/2) )
Essentially, the DDPG's action is the deviation from the middle between the maximum and minimum green duration.
The function that performs this computation can be found here.