强化学习：原理与Python实现

全球第一本配套 TensorFlow 2 代码的强化学习教程书

**第一本配套 TensorFlow 2 代码的纸质算法书

现已提供 TensorFlow 2 和 PyTorch 1 对照代码

Supporting materials in English can be found here.

代码、勘误更新等见这里。

本书特色

本书介绍强化学习理论及其 Python 实现。

理论完备：全书用一套完整的数学体系，严谨地讲授强化学习的理论基础，主要定理均给出证明过程。各章内容循序渐进，覆盖了所有主流强化学习算法，包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
案例丰富：在您最爱的操作系统（包括 Windows、macOS、Linux）上，基于 Python 3.10、Gym 0.24 和 TensorFlow 2 / PyTorch 1，实现强化学习算法。全书实现统一规范，体积小、重量轻。第 1～9 章给出了算法的配套实现，环境部分只依赖于 Gym 的最小安装，在没有 GPU 的计算机上也可运行；第 10～12 章介绍了多个热门综合案例，涵盖 Gym 的完整安装和自定义扩展，在有普通 GPU 的计算机上即可运行。

TensorFlow 2 和 PyTorch 1 对照代码

本书深度强化学习部分新增基于 TensorFlow 2 和 PyTorch 1 的对照实现。两个版本实现均和正文伪代码严格对应，两个版本仅在智能体部分实现不同，程序结构和智能体参数完全相同。ipynb格式见notebooks文件夹，HTML网页格式见html文件夹，两个版本内容相同。

代码已经过Python 3.10、Gym 0.24、TensorFlow 2和PyTorch 1验证。有错误请报错。

QQ群

QQ群：722846914（勘误报错可发此群，其他问题提问前请先Google，群主和管理员不提供免费咨询服务）
多任务群：696984257（非小白群，多任务强化学习+强化元学习+终身强化学习+迁移强化学习，勘误报错勿发此群，提问前请先Google）
关于入群验证问题：由于QQ的bug，即使正确输入答案，也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母，清注意大小写。
中文版书前言中给出的QQ群（935702193、243613392和948110103）已满，不再新增群成员，谢谢理解。

Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with one-on-one mapping TensorFlow 2 and PyTorch 1 Implementation

Check here for codes, exercise answers, etc.

Features

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
Practice: Every chapter is accompanied by high quality implementation based on Python 3.10, Gym 0.24, and TensorFlow 2 / PyTorch 1. All codes are compatible with Windows, Linux, and macOS, can be run in a laptop.

Please email me if you are interested in publishing this book in other languages. English version will be published by Springer Nature.

All codes have been saved as a .ipynb file in the directory "notebooks" and a .html file in the directory "html".

Chapter	Environment & Closed-Form Policy	Agent
2	CliffWalking-v0	Bellman
3	FrozenLake-v1	DP
4	Blackjack-v1	MC
5	Taxi-v3	SARSA, ExpectedSARSA, QL, DoubleQL, SARSA(λ)
6	MountainCar-v0	SARSA, SARSA(λ), DQN tf torch, DoubleDQN tf torch, DuelDQN tf torch
7	CartPole-0	VPG tf torch, VPGwBaseline tf torch, OffPolicyVPG tf torch, OffPolicyVPGwBaseline tf torch
8	Acrobot-v1	QAC tf torch, AdvantageAC tf torch, EligibilityTraceAC tf torch, PPO tf torch, NPG tf torch, TRPO tf torch, OffPAC tf torch
9	Pendulum-v1	DDPG tf torch, TD3 tf torch
10	LunarLander-v2	SQL tf torch, SAC tf torch, SACwA tf torch
10	LunarLanderContinuous-v2	SACwA tf torch
11	BipedalWalker-v3	ES, ARS
12	PongNoFrameskip-v4	CategoricalDQN tf torch, QR-DQN tf torch, IQN tf torch
13	BernoulliMAB-v0	UCB
13	GaussianMAB-v0	UCB
14	TicTacToe-v0	AlphaZero tf torch
15	HumanoidBulletEnv-v0	BehaviorClone tf torch, GAIL tf torch
16	Tiger-v0	VI