datawhalechina/easy-rl Issues
Typo 汇总
Updated 1PPO算法的实现, 为啥要给概率取对数?
Closed 2/chapter14/chapter14
Closed/chapter14/chapter14
Closed/chapter14/chapter14
Updated《9.3 优势演员-评论员算法》的公式(9.3)错误
Closed 3怎么在Linux服务器上运行demo程序?
Updated 2连续动作空间的PPO算法
Closed 2关于条件全期望公式的推导的问题
Closed 1纸质版是怎么做的?
Closed 1dqn算法问题
Closedthe version of numpy
Closed关于书中DDPG算法的疑问
UpdatedDDPG算法实现出现问题
Closed图6.8左下角标识应该是“动作价值(Q)”?
Closed 1DuelingDQN.ipynb中可能存在的两个BUG~
Updated添加参考文献
Closed 14.3 REINFORCE:蒙特卡洛策略梯度
Closed 1错别字
Closed 2最新的版本,可以出PDF吗
Closed 2value_iteration 算法不收敛 ?
Updated 1随书代码在哪
Closed 6内容勘误?
Closed 3SAC代码问题
Closed 2第五章勘误
Closed 1Edit problem in Chapter3
Closed 1第四章图4.10标注是不是有误?
Closed 11.7.1 Gym示例 返回值增多了
Closed 3DoubleDQN和DQN的update函数代码好像是一样的
Closed 1Spelling mistake
Updated 1MonteCarlo code error
Updated 1PPO advantage calculation
Closed 1Tutorial Notebook broken (Colab)
Closed 1能否提供代码中主要库的版本
Closed 2请问以后会增加MARL算法吗?
Closed 1Q-learning 出错
Closed 1conda的环境需要换成python==3.8了
Closed 1common文件夹里是不是少个py文件呀
Closed 2DQN代码错误
Closed 1“3.3.1 蒙特卡洛策略评估”中经验均值问题
Closed 3书写错误
Closed 1TD3 目标策略平滑化的工作原理 和 原始论文描述不一致
Closed 1PPO算法状态问题
Closed 1电子书图片标注问题
Closed 1内存使用超出预期
Updated大佬,为啥我总是显示common.utils调用失败啊
Closed 1