lmurmann / multi_illumination

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When are you going to public the trained model?

Haoyanlong opened this issue · comments

@lmurmann Hello, I want to do the illumination research topic.I want to know when you are going to public the model.Thank you very much!

Thanks for reaching out, I pushed a commit 1417542 with the model and evaluation script for illumination estimation. I am also working on releasing the relighting model, but still need a bit more time to clean up and package the code.

@lmurmann ,I have downloaded the dataset in your project site!I want to train the model of illumination estimation,the probes of input image is masked!But I don't know how to mask the probe in input image.Whether it is masked according to materias? Could you give me some advice?Thank you very much!
image

There is some extra meta data in json files stored with each scene (~/.multilum/<scene>/meta.json)

If the file is not there, you can download it from http://data.csail.mit.edu/multilum/<scene>/meta.json.

This file should contain entries like this
{ ...

  "bounding_box": {
   "x": 938.034985474631,
   "y": 2817.43052160101,
   "w": 975.3908405594467,
   "h": 948.096621403985
  },
  "boundary_points": [
   {
    "x": 1463.6340801792003,
    "y": 2820.1783656529014
   },
   {
    "x": 1692.8230026207625,
    "y": 2879.8437901088173
   },


....

for "gray" ball and "chrome" ball. The "boundary points" are hand-annotated points on the silhouette (around 10 points per ball). The "bounding_box" is a tight fitting axis-aligned bounding box. Coordinates are in pixels in range [0, 6000)x[0, 4000).

@lmurmann ,I have trained the single illumination estimation model.The images are as follow(input,prediction, groundtruth).I don't konw how to render a virtual object compositing in the scene.Could you teach me? Thank you very much!
image
image
image

Rendering objects into the images of our dataset is a bit difficult since you don't know the scene's geometry or camera pose. We only have a single viewpoint per scene, and so it is generally not possible to infer these values.

I would suggest you start using an existing AR application. Searching for something like "open source AR toolkit" brings plenty of hits that look good. Or you can build your own by following a tutorial. Searching for "opencv AR tutorial" should give some good hits. Building it yourself might take a while, but it is a great learning experience.

Once you have a basic AR system up and running, you can plug in the illumination estimation network and use the illumination prediction to improve the shading of virtual objects.

I hope these pointers are helpful!

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows,
image
It is about 0.01459 in training dataset and 0.01897 in test dataset.
The test images are as follows:
1.Input
image
2.Groundtruth
image
3.Pred_result
image
Could you tell me your test results?
And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway?Thank you very much!

Thanks for you questions.

Regarding stability, I have used the model on video input before and found it was quite stable. For a real application, you might want to add a simple filter that smoothes out potential variations in the prediction.
If you find that your predictions jump around dramatically, maybe that is a sign of overfitting. Also, you should try to make the input video looks as much like the training data as possible to shrink the domain gap. The model will probably perform better on indoor videos than for outdoor data.

Regarding comparison to our model, you can run the probe_predict/eval.py script on and compare to your predictions. The MSE numbers you report sound pretty good! We found it useful to, in addition to MSE, compare other metrics, such as the direction of the center of the light source, since these are independent of any normalization or gamma choices that often have a large impact on MSE.

When comparing the center of the light source to the ground truth center, our predictions achieved 26.6◦ mean angular error.

@lmurmann .Could you tell me how to calculate angular error between of the center of light source and the ground truth center? On the other hand, I preprocess the input image by adding black mask and resizing in training, but I find you preprocess the input by crop(512, 512) in eval.py. Cound you tell me how to preprocess the input image in training! Thank you!

For calculating the center of the light source:
In most cases, you can get an initialization by looking at the maximum image value. From that initialization, I fitted a gaussian to refine the fit. This works pretty well but I had to manually verify to make sure the optimization converged to the correct light source shape in all cases.

For training of the published model, I took (1500, 1000) pixels image and take random 512x512px crops (receptive field is half the image height).

I also tested with taking 256px crops (quarter of image height), but found the performance to be a bit worse, probably due to lack of context.

@lmurmann ,hello, I have tested the results of your model and the trained model from scratch as follows, I am confused with the difference of the sphere color.Could you give me some advice?Thank you very much!

1.input
image
2.groundtruth
image
3.the pred of your trained model
image
4.the pred of my trained model
image

And I don't konw how to do normalize the white balance and explosure
of the input image with gray sphere.Thank you very much!

@Haoyanlong Both results look pretty reasonable. Are you sure that 3. was predicted from the published model? Usually the our predictions get the slightly yellow color cast of the prediction shown in 4.

Below some more background information on our data processing and advice for auto-exposure and custom WB

Auto Exposure

The published data already normalized exposure with respect to the gray ball so there is not much extra work to do. In order to normalize exposure, we rescale the image intensity so that mean intensity of the gray ball falls to a constant value (I believe around 0.3 in the .exr)

Normalizing on the gray ball can be not ideal when there are large brightness variations across the image. In such cases, you could re-normalize after extracting training/eval patches. In the probe_predict/eval.py script we include the following auto-expose helper function

def autoexpose(I): 
  """Simple Auto-Expose helper for arbitrary image patches: 
Clip brightest 10% of pixels, map lower 90% of pixels to [0, 1] range. 
You might have to change the 90% threshold depending on the application.
  """
  n = np.percentile(I[:,:,1], 90)
  if n > 0:
    I = I / n
  return I```

White Balance

Regarding white balance for the illumination estimation application, we rely on the camera-provided white balance and let the raw converter (dcraw) handle white balance for us. Relying on camera white balance generally works since the flash has a known color and so the raw converter simply matches the temperature of the flash.

In case you want to do white balance yourself using the gray ball, a simple approach is to extract the mean values of the red, green, and blue channels for the chrome ball, and then simply re-scale the red channel and the blue channel so that they match the mean intensity of the green channel.

Just pushed the code and trained model for the relighting task 11674f6.

Hello, @lmurmann!
Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image.
Have you tried to do upsampling instead of a fully connected layer at the output of your model?

Ok, thanks for your answer :)

Hi Artyom, Yes, we have tried and found that it works as well. With the encoder that we are using, we fund that, the outputs of the fully connected decoder were a bit sharper so we went with that instead.

On Mon, May 25, 2020 at 6:44 AM Artyom Nazarenko @.***> wrote: Hello, @lmurmann https://github.com/lmurmann! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALG4H2PMWWAFVQIIKZMUF3RTJDX7ANCNFSM4JLUWZXA .

I have a question about reproducing the light probe estimation experiment.
Could you elaborate a bit more on experimental details?

I just guessed some parameters below and trained the model, but it seems to be not working (almost giving an average image).
input/target: LDR RGB image (the range of values is between -0.5~0.5), lr=1.0 * 10^{-4}, optimizer=Adam, iteration=2.0 * 10^{5}, random crop (512x512), loss: MSE(L2)

In Sec 4.2.1, you mentioned Like in Sec 4.1, we work in the log-domain to limit the dynamic range of the network’s internal activations, but there seems to be no description in Sec4.1

From left to right: input, output (no auto exposure), target
input_output_target

Hi Naoto, Your configuration generally looks good. You should be able to train a working version of the illumination prediction without converting to log domain.

In your setup, if the parameters that you mentioned don't work, I would try SGD with 1e-3 step size and L1 loss. Generally we were able to train models for a variety of hyper parameters and convergence should be better than the screenshots that you posted. Hyper parameter tuning for us mostly improved the sharpness of the predictions predictions, but the general direction of the illumination should be correct for most training runs.

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows,
image
It is about 0.01459 in training dataset and 0.01897 in test dataset.
The test images are as follows:
1.Input
image
2.Groundtruth
image
3.Pred_result
image
Could you tell me your test results?
And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway?Thank you very much!

The L2Loss curve almost didn't decrease as you show, so is this result normal ?
And I compute L2Loss on testset with weight provided by the author, the value is about 0.9, but your L2Loss is so small, so I want to know is there any difference between yours and the author's ?