tcbegley / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Don't persist iteration number in model checkpoints

tcbegley opened this issue · comments

Currently training the reward model starts at the iteration number of the transformer checkpoint, which is weird. We should just start counting iterations from 0 in the reward model training loop (assuming that the reward model is being trained from scratch, if loading a reward model from a checkpoint then we can count from the checkpointed iteration number of the reward model) regardless of how many iterations the transformer was trained for.