Don't use jpeg, use png to avoid lossy...

Question

Don't use jpeg, use png to avoid lossy...

scruffynerf opened this issue a year ago · comments

in detextify/inpainter.py#L129
the temporary file is a jpeg, should be a png to avoid lossy conversion, especially at 512x512...

I was passing in 512x512 pngs, and wondering why I got worse results back.

scruffynerf · Answer 1 · Mon Jan 02 2023 11:23:28 GMT+0800 (China Standard Time)

oh, might depend on the method... realizing I was looking at the replicate code, not the local... but same principle. Playing with adding steps too... 50 might not be enough to get the image back to original quality.

scruffynerf · Answer 2 · Mon Jan 02 2023 11:45:41 GMT+0800 (China Standard Time)

huggingface/diffusers#1368
says it's the strength being too high. Changing that to .3 works well And adding num_inference_steps=100 (or whatever) isn't working, unsure why. If I go change the default in the diffuser python module, that does work (so it's using the default, but not taking a argument it should)

100 steps is better, 200 is even better, but of course, it's 2x or 4x slower... but it's closer to the original image. (so when the text box overlaps half a head, it attempts to put the head back, etc...)

I also am trying "empty flat background, solid color, no text, blank" as a prompt, as I found it was adding lots of 'oh, let me get creative here..." moments.

Julia Turc · Answer 3 · Thu Jan 05 2023 03:07:31 GMT+0800 (China Standard Time)

@scruffynerf Thanks a lot for looking into this!

Indeed, the conversion to .jpeg happens for ReplicateSDInpainter only, so it doesn't explain the discrepancy.
Regarding strength / number of inference steps -- I'm not convinced this would fix it either; edges seem just as visible after 50 steps as they are after 300 (though the in-painted patches themselves look crisper, of course).

This fix makes the edges less jarring though (by in-painting the text boxes only, not the entire tile).