PeterL1n / BackgroundMattingV2

Real-Time High-Resolution Background Matting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why is foreground prediction necessary?

max810 opened this issue · comments

Hello,
First of all, good job with the paper! Nicely written and explains a lot of concepts pretty well.
However, I am still a little puzzled about why is predicting foreground (or foreground residual in this case) necessary in the pipeline.
Consider this example from the demo:
image

For the composition (the final step) - why do we use the pixels from upsampled foreground and not from the original image? They are supposed to be identical anyway, because we explicitly train the coarse foreground prediction to replicate the pixels from the original image (in the alpha mask region) (formula 2):
image

A possible answer is mentioned in Issue#19, but it's unclear to me what the "background color spill onto partial-opacity hairs and edges" looks like and how does foreground prediction branch mitigate this issue.

I would greatly appreciate an explanation and/or just a side-by-side comparison of 2 images (original vs predicted foreground).

Thank you in advance!

The foreground is only equal to the source on regions where alpha = 1. But for semitransparent regions, it is not, because part of the original background will leak through. These regions are usually hair, silhouette, and motion blur.

But the foreground will be learned to be the same as the original pixels for all alpha > 0, not just alpha = 1, no?

No, it won't. The dataset provides ground truth foreground F and alpha a. We composite them to a background to synthesize a synthetic source input I = aF + (1-a)B. The model predicts foreground F' and alpha a'. The loss on F' is from ground truth F.

Oh, so we use the properly-extracted foregrounds from the datasets and the model directly learns to remove the background in those situations you described (hair strands, motion blur, etc.). I missed that, sorry.

Thanks for the explanation!