hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Home Page:http://stable-baselines.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorflow 2.0 support?

heron1 opened this issue · comments

I'm not able to get this library working with Tensorflow 2. A simple "from stable_baselines import PPO2" results in an error stating ModuleNotFoundError: No module named 'tensorflow.contrib'

My google searches only seem to conclude that tf.contrib isn't working with TF2? Ive tried this with both tensorflow-gpu-2.0.0-beta0 and alpha under Python 3.7 Windows 10.

Version3 is now online: https://github.com/DLR-RM/stable-baselines3

Hello,

Good you ask. I wanted to do an issue on that anyway.
Short answer: Stable-Baselines is developed and tested against tf>v1.5.0 only for now.
So it is highly probable that it won't work with tf2.

Longer answer:
with TF 2.0, a lot of breaking changes are done, which will require quite a lot of change to adapt to that new API (we don't have much time, so we will need some help on that ;)).

At first, because we don't want to break people code, I would favor a "transition phase", where Stable-Baselines would be working with both TF1 and TF2 (see migration guide), and at some point, it would make sense to remove all the compatibility code for TF1.

The main problem is that the code uses different functions of TF that achieve the same behavior (e.g. tf.contrib.layers or even a custom declaration of variables in a2c.utils for instance), and some are already deprecated (e.g. code in ACKTR). This may break model saved with previous SB version, so we will need to write some "glue" code to allow the transition.

I would also first update the minimum version of tensorflow (from 1.5.0 to >1.13 in order to have access to the compat module)

The main question is what is the roadmap, when do we want to do that, do we consider it as urgent or should we wait?
For that, I need the opinion of others maintainers: @hill-a , @erniejunior , @AdamGleave ?

EDIT: the draft repo is here: https://github.com/Stable-Baselines-Team/stable-baselines-tf2 (ppo and td3 included for now)

Since TensorFlow 2.0 is still in beta it probably makes sense to wait a bit before starting in earnest in case the API changes further.

I don't have a good sense of how difficult it would be to maintain TensorFlow 1 & 2 compatibility. It sounds useful but I would also be OK with making a breaking release of Stable Baselines, and backporting important fixes to the TensorFlow 1 branch for some period of time.

I've now had a little experience working with TensorFlow 2. The new API seems much improved and I'd been keen on switching. However, it seems like significant effort to upgrade, and maintaining TF 1 and 2 compatibility seems more difficult than I thought at first.

You can still access the v1 API in TensorFlow 2 via tf.compat.v1, but there's a program global setting of TensorFlow v2 behavior (e.g. eager) via tf.enable_v2_behavior. The only way I can see to try and support both versions is to try and make Stable Baselines work whether tf.enable_v2_behavior() or tf.disable_v2_behavior() has been called, and let the user decide. This changes a lot (e.g. eager or non-eager), so I'd be surprised if this is possible, but worth a try.

Ok, in that case, maybe it is worth making all the breaking changes at once? (for V3.0)
The user api will stay the same but we will need to write some scripts to port saved models from v2 to v3.

Also, I think we need to choose between eager mode (pytorch like if I understand) or normal mode ?

Would love to contribute here but my knowledge of ML is a bit limited to help with development. As an end user I'd just like to note the thing that attracted me to this project was the beauty and simplicity with which Stable Baselines was able to transform the OpenAI library. Perhaps if the eager execution of TF2.0 could in some way be embraced with the same philosophy of simplicity it would be beneficial to end-users attempting to fully understand the code? Although if the efficiency sacrifices are too great then of course nevermind.

If we switch to eager mode, then better to rebuild the library mostly from scratch, and I would definitely love to do it in pytorch.
The main issue is both the amount of work and the risk of introducing bugs.

My understanding is the recommended style for TF 2.x is to write things as small functions, and then compose them together into bigger functions. Development and debugging can then use eager, and for deployment the user can use the tf.function decorator, at which point TensorFlow will use https://www.tensorflow.org/beta/guide/autograph to convert it into a graph when first called. This gives the usability of eager with the performance of graph execution when needed.

Switching to this style would certainly be a big rewrite, though.

If we switch to eager mode, then better to rebuild the library mostly from scratch, and I would definitely love to do it in pytorch.
The main issue is both the amount of work and the risk of introducing bugs.

If so, I bet I have to migrate to Pytorch. For me, I'm quite familiar with low-level APIs from Tensorflow. Anyway, let's see if the future version is easier to use/ modify. Thank you guys.

I'd like to re-raise araffin's point that upgrading the minimum Tensorflow version from 1.5.0 (Jan '18) to 1.14.0 (Jun '19) is a necessary intermediate step, and could be broken off as a separate issue.

I suspect few users/developers are actually using 1.5.0 in their daily work, so it's a little fuzzy what version new contributions should be targeting. Also the mess of deprecation warnings different people see are probably very different (I am personally on 1.14.0).

NB: The release notes for 1.14 say "This is the first 1.x release containing the compat.v2 module. This module is required to allow libraries to publish code which works in both 1.x and 2.x. After this release, no backwards incompatible changes are allowed in the 2.0 Python API."

minimum Tensorflow version from 1.5.0 (Jan '18) to 1.14.0 (Jun '19) is a necessary intermediate step, and could be broken off as a separate issue.

The plan is to raise it progressively: it will be 1.8.0 soon (see #428) to match the docker image version (which are used in the Continuous Integration tests).

Then, I would favor a jump to 1.14.0 to support both tf versions.

As a side note, I remember for version higher than 1.8.0, there will be already an issue with keras custom policy (cf #220)

After thinking more about the problem, if we have a baselines-common package (see #503), it makes more sense to have two versions (one for tf1, one for tf2) so we can directly switch to the new paradigm without having intermediate code.
This would also allow us to do cleaning and breaking changes that were waiting for a while.
I feel that having a version working with both will just delay the refactoring needed.

We can also take a look at openai#978 for inspiration

I'm in favour of not trying to maintain both TF1 and TF2 compatibility. This seems extremely challenging and ultimately not that useful.

A baselines-common package would help but I don't think it's necessary. We could just have a tf2 branch, and when it's stable flip that to master. It seems OK to obsolete the old branch: if wanted we could occasionally backport some fixes.

There'll be some annoying work merging in changes to the common code during that time, but it doesn't seem that big a deal.

Just wanted to check if/when we can use with TF2?

Just wanted to check if/when we can use with TF2?

To know that, you should look at this milestone (https://github.com/hill-a/stable-baselines/milestone/6) and this PR: #542

As Stable-Baselines is maintained on our free time, we don't have a fixed release date, currently we are at the discussion phase, not it is not possible now to use it with tf2.

As an update (and as mentioned in #576 ), I started a draft repo here: https://github.com/Stable-Baselines-Team/stable-baselines-tf2

It has for now:

  • TD3 implemented
  • PPO implemented both for discrete/continuous actions

fyi:
we are having a discussion with the Stable Baselines maintainers about its future, and we need your opinion.

As a big change is required anyway, what framework would you prefer to be the base for V3.0 ?

we have poll on Twitter, ending in one week :
https://mobile.twitter.com/araffin2/status/1223310856471138306

closing this issue in favor of #733

Linking the tf2 support repo here: #984