Simple_PPO 中最后一个state的值是否应该为0?
YingxiaoKong opened this issue · comments
YingxiaoKong commented
莫凡你好,我看了你的程序,然后想起来我们老师课上讲过,最后一个state的q-value应该是0.我看了一点其他人的程序,有的人也把最后一个state 的返回值写为0,是否是真的需要呢?
Dr-Tuski commented
最后终止时总要给Q一个赋值吧?不然程序怎么走啊?
实际问题按实际物理意义来,不是0也可以
Morvan commented
是的,需要按照模拟的实际情况来判断赋值多少。
YingxiaoKong commented
最后终值的时候是有一个reward,但是这个reward是达到最后一个状态的前一个状态的reward,而不是最后一个状态的。state(T-1) = reward; state(T) 是终止状态,没有下一个状态,奖赏都是根据下一个状态来给的,所以最后一个状态没有奖赏,也就没有累积奖赏,因此它的q value应该是0。不知道这样想对不对?
在 2020年4月21日,下午8:30,Dr-Tuski <notifications@github.com> 写道:
最后终止时总要给Q一个赋值吧?不然程序怎么走啊?
实际问题按实际物理意义来,不是0也可以
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMorvanZhou%2FReinforcement-learning-with-tensorflow%2Fissues%2F172%23issuecomment-617483779&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C5dd8c34f571840ba1dca08d7e65cc297%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637231158412019490&sdata=nRvw%2BBhWz67zlXiam1dgDzQqdUW9OI9C%2BYT7ythToEU%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIMGH4XLRAICFAPARMMSKRLRNZCDZANCNFSM4MNTZH3Q&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C5dd8c34f571840ba1dca08d7e65cc297%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637231158412029478&sdata=MGHmtEgZjqQVslmV0RJIwZCy9GM1jLrki%2FSm%2BIhIdC4%3D&reserved=0>.
Morvan commented
可以这么理解,不过有种情况是环境没有尽头,所以也不会有last state.
YingxiaoKong commented
是的,谢谢!
在 2020年4月22日,上午12:10,Morvan <notifications@github.com> 写道:
可以这么理解,不过有种情况是环境没有尽头,所以也不会有last state.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMorvanZhou%2FReinforcement-learning-with-tensorflow%2Fissues%2F172%23issuecomment-617553993&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C3ff1b3e6b4f54443422a08d7e67b83b7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637231290480981126&sdata=9LZjL8M9t9bwLSQq47v5nQotV9l0%2F%2BHMAwdkdj2Y1BA%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIMGH4UD5FLVBGCLYNI7BSDRNZ35LANCNFSM4MNTZH3Q&data=02%7C01%7Cyingxiao.kong%40vanderbilt.edu%7C3ff1b3e6b4f54443422a08d7e67b83b7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637231290480981126&sdata=jQp8XIlTYmZT9k5pPCOaR293UTQBZNtvJLRyR%2FkNERA%3D&reserved=0>.