Examples

Question

Examples

fakerybakery opened this issue 6 months ago · comments

Hi,
Will you be providing any examples for real-world implementations of Pearl?
Thank you for creating this amazing project!

Zheqing (Bill) Zhu · Answer 1 · Sun Dec 10 2023 04:40:07 GMT+0800 (China Standard Time)

Hi there, thanks for reaching out! We are working on adding more examples/tutorials after our NeurIPS presentation! Will close this task once the examples are in.

mrfakename · Answer 2 · Sun Dec 10 2023 08:41:16 GMT+0800 (China Standard Time)

Hi, one question, do you think there's any possibility of using this on LLMs?

mrfakename · Answer 3 · Tue Dec 12 2023 08:57:56 GMT+0800 (China Standard Time)

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

rodrigodesalvobraz · Answer 4 · Wed Dec 13 2023 23:49:53 GMT+0800 (China Standard Time)

Hi, one question, do you think there's any possibility of using this on LLMs?

In principle, yes, especially if the LLM is implemented in PyTorch.

rodrigodesalvobraz · Answer 5 · Wed Dec 13 2023 23:58:18 GMT+0800 (China Standard Time)

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

Unfortunately it does not look like NeurIPS makes videos available. I am checking to see if slides can be made available.

rodrigodesalvobraz · Answer 6 · Fri Dec 15 2023 06:48:43 GMT+0800 (China Standard Time)

Also, will your NeurIPS presentation be available on YouTube or another free online platform?

We will make the slides available on the Pearl web site next week or son. Thanks.

Zheqing (Bill) Zhu · Answer 7 · Fri Dec 15 2023 14:26:44 GMT+0800 (China Standard Time)

@fakerybakery wanted to add a bit of clarification on LLM support. We don't officially support LLMs in the current beta version yet but in principle you could build some interim solution. Depends on whether you'd like to make language or token as action space, you would need to integrate huggingface tokenizer or transformer/language representation module in both history and action representation modules. If you need to finetune the representations, you would also need to have these model's parameters tracked by the policy learner.

We will try to add language based action and observation support in the future at some point as well. Hope this helps.

Markus Kaukonen · Answer 8 · Wed Dec 20 2023 20:05:58 GMT+0800 (China Standard Time)

Any progress on examples except lots of promises?

Zheqing (Bill) Zhu · Answer 9 · Thu Dec 21 2023 03:52:02 GMT+0800 (China Standard Time)

@paapu88 Stay tuned. As promised, the first set of tutorials will come out this week.

Zheqing (Bill) Zhu · Answer 10 · Thu Dec 21 2023 11:08:11 GMT+0800 (China Standard Time)

Our NeurIPS presentation slides are shared now on Repo front page. Please check it out. The first set of examples will be released tomorrow with more coming in January 2024.

Markus Kaukonen · Answer 11 · Thu Dec 21 2023 15:39:50 GMT+0800 (China Standard Time)

Excellent, the presentation is very good. Unfortunately, not so many examples there.
Below is my poor-mans example (somewhat possibly managed steps 1,2 below) that could serve to illustrate what sort of basic example I was looking for:

take some gymnasium or even better some gymnasium-derived environment (like https://github.com/Farama-Foundation/HighwayEnv)
optimize the agent with deepQ, save it
load the trained agent and run a demo with it in the environment

""" 
copy pasted from 
https://github.com/facebookresearch/Pearl?tab=readme-ov-file#quick-start

with small modifications for training,

NOTE: this environment is such that it is ok to go out of box, only falling pole is penalized.

"""


from pearl.pearl_agent import PearlAgent
from pearl.action_representation_modules.one_hot_action_representation_module import (
    OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
    DeepQLearning,
)
from pearl.replay_buffers.sequential_decision_making.fifo_off_policy_replay_buffer import (
    FIFOOffPolicyReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment
from pearl.action_representation_modules.identity_action_representation_module import (
    IdentityActionRepresentationModule,
)
from pearl.utils.functional_utils.train_and_eval.online_learning import online_learning

from time import sleep
import gym
from tqdm import tqdm
import torch
import matplotlib.pyplot as plt
import numpy as np


env = GymEnvironment("CartPole-v1", render_mode="human")
observation, action_space = env.reset()

agent = PearlAgent(
    policy_learner=DeepQLearning(
        state_dim=4,
        action_space=action_space,
        hidden_dims=[64, 64],
        training_rounds=20,
        action_representation_module=OneHotActionTensorRepresentationModule(
            max_number_actions=action_space.n
        ),
    ),
    replay_buffer=FIFOOffPolicyReplayBuffer(10_000),
)

# experiment code
number_of_steps = 10000
record_period = 1000

info = online_learning(
    agent=agent,
    env=env,
    number_of_steps=number_of_steps,
    print_every_x_steps=1000,
    record_period=record_period,
    learn_after_episode=True,
)
torch.save(info["return"], "CartPole-DQN-return.pt")
plt.plot(record_period * np.arange(len(info["return"])), info["return"], label="DQN")
plt.legend()
plt.show()

# model=???
# model.load_state_dict(torch.load("CartPole-DQN-return.pt"))

Zheqing (Bill) Zhu · Answer 12 · Fri Dec 22 2023 10:19:37 GMT+0800 (China Standard Time)

As promised, the first tutorial is out for a recommender system environment. https://github.com/facebookresearch/Pearl/tree/main?tab=readme-ov-file#tutorials

More tutorials will come next year. Merry Christmas all! I'll close this issue for now and feel free to open other tasks if you have any other questions. Thanks!