can't reproduce the results

Question

can't reproduce the results

andorxornot opened this issue 2 years ago · comments

hi! i trained ldm with three images and the token "container":

training takes lasted a few hours, the loss jumps, but i got exactly the same result as without training:

the config is loaded correctly. are there any logs besides the loss?

rinongal · Answer 1 · Wed Aug 24 2022 20:16:22 GMT+0800 (China Standard Time)

What text are you using for inference?
Unless you changed the config, the placeholder word for your concept is *, so your sentences should be of the form: "a photo of *" (and not "a photo of a container")

Sasha · Answer 2 · Wed Aug 24 2022 20:47:12 GMT+0800 (China Standard Time)

yeah, i used a photo of * prompt , but got the container

rinongal · Answer 3 · Wed Aug 24 2022 20:49:45 GMT+0800 (China Standard Time)

Can you please:
(1) Post your full inference command?
(2) Check your logs folder images to see if the samples_scaled_gs images look like your input data?

Sasha · Answer 4 · Wed Aug 24 2022 22:23:07 GMT+0800 (China Standard Time)

hm, \logs\images...\testtube\version_0\media is empty for me, there are no images

train :

main.py
--data_root ./images
--base ./configs/latent-diffusion/txt2img-1p4B-finetune.yaml
-t
-n run_01
--actual_resume ./models/ldm/text2img-large/model.ckpt
--init_word container
--gpus 0

inference:

scripts/txt2img.py
--ddim_eta 0.0
--n_samples 3
--n_iter 2
--scale 10.0
--ddim_steps 50
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings.pt
--ckpt ./models/ldm/text2img-large/model.ckpt
--prompt "a photo of *"

rinongal · Answer 5 · Wed Aug 24 2022 22:33:17 GMT+0800 (China Standard Time)

The images should be in your ./logs/images2022-08-23T21-03-11_run_01/images/ directory.
Either way, when you run txt2img, try to run with:
--embedding_path ./logs/images2022-08-23T21-03-11_run_01/checkpoints/embeddings_gs-5xxx.pt where 5xxx is whatever checkpoint you have there which is closest to 5k.

John D. Pope · Answer 6 · Wed Aug 24 2022 23:14:01 GMT+0800 (China Standard Time)

fyi - I've got it working and I'm very impressed - I'm interested to know how to boost quality / dimensions of output.. have to dig into docs.
HOW TO

I used this to scrape google images
https://chrome.google.com/webstore/detail/dbjbempljhcmhlfpfacalomonjpalpko
I searched for a famous photographer - gregory_crewdson

I train all his photos as "cinematic"
python main.py --base configs/latent-diffusion/txt2img-1p4B-finetune.yaml -t --actual_resume ../stable-diffusion/models/ldm/text2img-large/model.ckpt -n leavanny_attempt_one --gpus 0, --data_root "/home/jp/Downloads/ImageAssistant_Batch_Image_Downloader/www.google.com/gregory_crewdson_-_Google_Search" --init_word=cinematic
(I gave up at 10,000 training iterations.)

I can then prime it with

 photo of * 
 pixelart or * 
 watercolor of * 

python scripts/txt2img.py --ddim_eta 0.0 \
                          --n_samples 8 \
                          --n_iter 2 \
                          --scale 10.0 \
                          --ddim_steps 50 \
                          --embedding_path /home/jp/Documents/gitWorkspace/textual_inversion/logs/gregory_crewdson_-_Google_Search2022-08-24T23-09-43_leavanny_attempt_one/checkpoints/embeddings_gs-9999.pt \
                          --ckpt_path ../stable-diffusion/models/ldm/text2img-large/model.ckpt \
                          --prompt "pixelart of *"

rinongal · Answer 7 · Wed Aug 24 2022 23:53:15 GMT+0800 (China Standard Time)

@johndpope Glad to see some positive results 😄
Regarding quality / dimensions: I'm still working on the Stable Diffusion port which will probably help with that. At the moment inversion is working fairly well, but I'm having some trouble finding a 'sweet spot' where editing (by reusing * in new prompts) works as expected. It might require moving beyond just parameter changes.

As a temporary alternative, you should be able to just invert these results into the stable diffusion model and let it come up with new variations at a higher resolution (using just 'a photo of *').

Xavier · Answer 8 · Thu Aug 25 2022 00:19:08 GMT+0800 (China Standard Time)

Hi, when I train the embedding and run the generation command, I can obtain samples that shares some high-level similarity with my training inputs, however, they still look quite different in details (far less similar than the demo images in paper). Given that the reconstruction is perfect, is there a way to control the variation and let the generated samples look more similar to the inputs? Thanks!

rinongal · Answer 9 · Thu Aug 25 2022 00:49:05 GMT+0800 (China Standard Time)

@XavierXiao First of all, just to make sure, you're using the LDM version, yes?

If that's the case, then you have several options:

Re-invert with a higher learning rate (e.g. edit the learning rate in the config to 1.0e-2). The higher the learning rate, the higher the image similarity after editing, but more prompts will fail to change the image at all.
Try to re-invert with another seed (using the --seed argument). Unfortunately sometimes the optimization just falls into a bad spot.
Try the same prompt engineering tricks you'd try with text. For example, use the placeholder several times ("a photo of * on the beach. A * on the beach").

Other than that, you'll see in our paper that we report that the results are typically 'best of 16'. There are certainly cases where only 3-4 images out of a batch of 16 were 'good'. And of course like with all txt2img models, some prompts just don't work.

If you can show me some examples, I could maybe point you towards specific solutions.

Xavier · Answer 10 · Thu Aug 25 2022 01:01:25 GMT+0800 (China Standard Time)

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

rinongal · Answer 11 · Thu Aug 25 2022 01:28:57 GMT+0800 (China Standard Time)

The one on the right is more or less what I'd expect to get. If you're still having bad results during training, then seed changes etc. probably won't help. Either increase LR, or have a look at the output images and see if there's still progress, in which case you can probably just train for more time.

I'll try a run myself and see what I can get.

rinongal · Answer 12 · Thu Aug 25 2022 02:13:41 GMT+0800 (China Standard Time)

@andorxornot This is what I get with your data:

Training outputs (@5k):

Watercolor painting of *:

A photo of * on the beach:

rinongal · Answer 13 · Thu Aug 25 2022 02:21:28 GMT+0800 (China Standard Time)

@XavierXiao I cropped out and trained on these 2 samples from your image:

Current outputs @4k steps with default parameters:

If you're using the default parameters but only 1 GPU, the difference might be because the LDM training script automatically scales LR by your number of GPUs and the batch size. Your effective LR is half of mine, which might be causing the difference. Can you try training with double the LR and letting me know if that improves things? If so, I might need to disable this scaling by default / add a warning to the readme.

Xavier · Answer 14 · Thu Aug 25 2022 03:09:05 GMT+0800 (China Standard Time)

Thanks for the reply. I am using two GPUs so that shouldn't be an issue. I tried larger LR but it is hard to say whether I obtain improvements. I can obtain similar results as yours. Obviously the resulting images are less realistic than the trash container examples in the same thread, so maybe the input images are less familiar for the LDM model.

Some maybe unrelated things:

I got the following warning after every epoch, is that expected?

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

In the first epoch I got the following warning

home/.conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 20. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.

I use the default config with bs=4, and I have 8 training images. Not sure what caused this.

In one over-night run, it seems like if we don't manually kill the process, it will run 1000 epochs, which is the max of pytorch lighting. So the max_step = 6100 is not effective?

rinongal · Answer 15 · Thu Aug 25 2022 03:41:45 GMT+0800 (China Standard Time)

@XavierXiao Warnings should both be fine.
max_step: It should be working. I'll look into it.

Sasha · Answer 16 · Thu Aug 25 2022 04:07:04 GMT+0800 (China Standard Time)

thanks for your tests! it seems that for one machine i had to raise the lr ten times

rinongal · Answer 17 · Thu Aug 25 2022 04:17:52 GMT+0800 (China Standard Time)

@andorxornot Well, if everything's working now, feel free to close the issue 😄 Otherwise let me know if you need more help

Xodroc · Answer 18 · Thu Aug 25 2022 12:27:05 GMT+0800 (China Standard Time)

@rinongal I think I'm having a similar issue but not familiar with the format of the learning rate in order to increase it.

EDIT: noticed I'm getting "RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
rank_zero_warn(" from pytorch lightning lr_monitor.py

Also this is trying to use the stable diffusion v1_finetune.yaml and my samples_scaled all just look like noise at and well after 5000 global steps. Loss is pretty much staying at =1 or 0.99

I'll create a new issue if need be.

rinongal · Answer 19 · Thu Aug 25 2022 17:04:38 GMT+0800 (China Standard Time)

@XodrocSO I think it might be worth a new issue, but when you open it could you please:

Check the input and reconstruction images in your log directory to see that they look fine.
Paste the config file you're using and let me know if you're using the official repo or some re-implementation and whether you changed anything else.
Upload an example of your current samples_scaled results.

Hopefully that will be enough to get started on figuring out the problem :)

Bery_Z · Answer 20 · Tue May 23 2023 19:34:56 GMT+0800 (China Standard Time)

@andorxornot Would it be convenient for you to share your images?

yh bai · Answer 21 · Wed Oct 04 2023 10:25:29 GMT+0800 (China Standard Time)

Thanks! I am using LDM version, with default setting in readme. I will give a try on things you mentioned, especially the lr. Here are some examples. I am trying to invert some images in MVTec for industrial quality inspection, and I attached the input (some capsules) and generated samples at 5k steps. Does this look reasonable? The inputs have very few variations (they look very similar to each other), is that the possible cause?

What is the effect of the image you generated later?