kvablack / ddpo-pytorch

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code logics, thanks

junyongyou opened this issue · comments

I am a bit confused with the logics in the train script.

A "new" unet is defined as pipeline.unet, unet.parameters() is then put in optimizer, and finally loss is computed from unet. Thus, can I understand that this new unet will be updated.

However, we know that pipeline.unet should be updated, and I can observe that the unet in pipeline is indeed updated, not the new unet.

Can anybody tell me why this new unet should be defined? Can we just use something like this:

optimizer(pipeline.unet.parameters(), ...)
noise_pred = pipeline.unet(...)

Thank you very much.