When are you going to public the trained model?

Question

When are you going to public the trained model?

Haoyanlong opened this issue 5 years ago · comments

@lmurmann Hello, I want to do the illumination research topic.I want to know when you are going to public the model.Thank you very much!

Lukas Murmann · Answer 1 · Tue Nov 12 2019 05:13:51 GMT+0800 (China Standard Time)

Thanks for reaching out, I pushed a commit 1417542 with the model and evaluation script for illumination estimation. I am also working on releasing the relighting model, but still need a bit more time to clean up and package the code.

Haoyanlong · Answer 2 · Wed Nov 13 2019 10:23:30 GMT+0800 (China Standard Time)

@lmurmann ,I have downloaded the dataset in your project site!I want to train the model of illumination estimation,the probes of input image is masked!But I don't know how to mask the probe in input image.Whether it is masked according to materias? Could you give me some advice?Thank you very much!

Lukas Murmann · Answer 3 · Thu Nov 14 2019 03:59:39 GMT+0800 (China Standard Time)

There is some extra meta data in json files stored with each scene (~/.multilum/<scene>/meta.json)

If the file is not there, you can download it from http://data.csail.mit.edu/multilum/<scene>/meta.json.

This file should contain entries like this
{ ...

  "bounding_box": {
   "x": 938.034985474631,
   "y": 2817.43052160101,
   "w": 975.3908405594467,
   "h": 948.096621403985
  },
  "boundary_points": [
   {
    "x": 1463.6340801792003,
    "y": 2820.1783656529014
   },
   {
    "x": 1692.8230026207625,
    "y": 2879.8437901088173
   },


....

for "gray" ball and "chrome" ball. The "boundary points" are hand-annotated points on the silhouette (around 10 points per ball). The "bounding_box" is a tight fitting axis-aligned bounding box. Coordinates are in pixels in range [0, 6000)x[0, 4000).

Haoyanlong · Answer 4 · Thu Nov 14 2019 15:09:26 GMT+0800 (China Standard Time)

@lmurmann ,I have trained the single illumination estimation model.The images are as follow(input,prediction, groundtruth).I don't konw how to render a virtual object compositing in the scene.Could you teach me? Thank you very much!

Lukas Murmann · Answer 5 · Fri Nov 15 2019 23:51:27 GMT+0800 (China Standard Time)

Rendering objects into the images of our dataset is a bit difficult since you don't know the scene's geometry or camera pose. We only have a single viewpoint per scene, and so it is generally not possible to infer these values.

I would suggest you start using an existing AR application. Searching for something like "open source AR toolkit" brings plenty of hits that look good. Or you can build your own by following a tutorial. Searching for "opencv AR tutorial" should give some good hits. Building it yourself might take a while, but it is a great learning experience.

Once you have a basic AR system up and running, you can plug in the illumination estimation network and use the illumination prediction to improve the shading of virtual objects.

I hope these pointers are helpful!

Haoyanlong · Answer 6 · Mon Nov 18 2019 11:16:24 GMT+0800 (China Standard Time)

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows,

It is about 0.01459 in training dataset and 0.01897 in test dataset.
The test images are as follows:
1.Input

2.Groundtruth

3.Pred_result

Could you tell me your test results?
And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway？Thank you very much!

Lukas Murmann · Answer 7 · Wed Nov 20 2019 06:42:47 GMT+0800 (China Standard Time)

Thanks for you questions.

Regarding stability, I have used the model on video input before and found it was quite stable. For a real application, you might want to add a simple filter that smoothes out potential variations in the prediction.
If you find that your predictions jump around dramatically, maybe that is a sign of overfitting. Also, you should try to make the input video looks as much like the training data as possible to shrink the domain gap. The model will probably perform better on indoor videos than for outdoor data.

Regarding comparison to our model, you can run the probe_predict/eval.py script on and compare to your predictions. The MSE numbers you report sound pretty good! We found it useful to, in addition to MSE, compare other metrics, such as the direction of the center of the light source, since these are independent of any normalization or gamma choices that often have a large impact on MSE.

When comparing the center of the light source to the ground truth center, our predictions achieved 26.6◦ mean angular error.

Haoyanlong · Answer 8 · Thu Nov 21 2019 18:05:54 GMT+0800 (China Standard Time)

@lmurmann .Could you tell me how to calculate angular error between of the center of light source and the ground truth center? On the other hand, I preprocess the input image by adding black mask and resizing in training, but I find you preprocess the input by crop(512, 512) in eval.py. Cound you tell me how to preprocess the input image in training! Thank you!

Lukas Murmann · Answer 9 · Sun Nov 24 2019 11:41:53 GMT+0800 (China Standard Time)

For calculating the center of the light source:
In most cases, you can get an initialization by looking at the maximum image value. From that initialization, I fitted a gaussian to refine the fit. This works pretty well but I had to manually verify to make sure the optimization converged to the correct light source shape in all cases.

Lukas Murmann · Answer 10 · Sun Nov 24 2019 11:49:37 GMT+0800 (China Standard Time)

For training of the published model, I took (1500, 1000) pixels image and take random 512x512px crops (receptive field is half the image height).

I also tested with taking 256px crops (quarter of image height), but found the performance to be a bit worse, probably due to lack of context.

Haoyanlong · Answer 11 · Mon Nov 25 2019 18:00:38 GMT+0800 (China Standard Time)

@lmurmann ,hello, I have tested the results of your model and the trained model from scratch as follows, I am confused with the difference of the sphere color.Could you give me some advice?Thank you very much!

1.input

2.groundtruth

3.the pred of your trained model

4.the pred of my trained model

And I don't konw how to do normalize the white balance and explosure
of the input image with gray sphere.Thank you very much!

Lukas Murmann · Answer 12 · Thu Nov 28 2019 04:38:45 GMT+0800 (China Standard Time)

@Haoyanlong Both results look pretty reasonable. Are you sure that 3. was predicted from the published model? Usually the our predictions get the slightly yellow color cast of the prediction shown in 4.

Below some more background information on our data processing and advice for auto-exposure and custom WB

Lukas Murmann · Answer 13 · Thu Nov 28 2019 04:39:14 GMT+0800 (China Standard Time)

Auto Exposure

The published data already normalized exposure with respect to the gray ball so there is not much extra work to do. In order to normalize exposure, we rescale the image intensity so that mean intensity of the gray ball falls to a constant value (I believe around 0.3 in the .exr)

Normalizing on the gray ball can be not ideal when there are large brightness variations across the image. In such cases, you could re-normalize after extracting training/eval patches. In the probe_predict/eval.py script we include the following auto-expose helper function

def autoexpose(I): 
  """Simple Auto-Expose helper for arbitrary image patches: 
Clip brightest 10% of pixels, map lower 90% of pixels to [0, 1] range. 
You might have to change the 90% threshold depending on the application.
  """
  n = np.percentile(I[:,:,1], 90)
  if n > 0:
    I = I / n
  return I```

Lukas Murmann · Answer 14 · Thu Nov 28 2019 04:40:08 GMT+0800 (China Standard Time)

White Balance

Regarding white balance for the illumination estimation application, we rely on the camera-provided white balance and let the raw converter (dcraw) handle white balance for us. Relying on camera white balance generally works since the flash has a known color and so the raw converter simply matches the temperature of the flash.

In case you want to do white balance yourself using the gray ball, a simple approach is to extract the mean values of the red, green, and blue channels for the chrome ball, and then simply re-scale the red channel and the blue channel so that they match the mean intensity of the green channel.

Lukas Murmann · Answer 15 · Thu Dec 05 2019 09:15:10 GMT+0800 (China Standard Time)

Just pushed the code and trained model for the relighting task 11674f6.

Artyom Nazarenko · Answer 16 · Mon May 25 2020 18:44:02 GMT+0800 (China Standard Time)

Hello, @lmurmann!
Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image.
Have you tried to do upsampling instead of a fully connected layer at the output of your model?

Lukas Murmann · Answer 17 · Tue May 26 2020 01:23:05 GMT+0800 (China Standard Time)

Hi Artyom, Yes, we have tried and found that it works as well. With the encoder that we are using, we fund that, the outputs of the fully connected decoder were a bit sharper so we went with that instead.

…

On Mon, May 25, 2020 at 6:44 AM Artyom Nazarenko ***@***.***> wrote: Hello, @lmurmann <https://github.com/lmurmann>! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALG4H2PMWWAFVQIIKZMUF3RTJDX7ANCNFSM4JLUWZXA> .

Artyom Nazarenko · Answer 18 · Tue May 26 2020 12:56:40 GMT+0800 (China Standard Time)

Ok, thanks for your answer :)

Hi Artyom, Yes, we have tried and found that it works as well. With the encoder that we are using, we fund that, the outputs of the fully connected decoder were a bit sharper so we went with that instead.
…
On Mon, May 25, 2020 at 6:44 AM Artyom Nazarenko @.***> wrote: Hello, @lmurmann https://github.com/lmurmann! Thanks for your article and the dataset. I would like ask you about your experiments with the model for a single image. Have you tried to do upsampling instead of a fully connected layer at the output of your model? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALG4H2PMWWAFVQIIKZMUF3RTJDX7ANCNFSM4JLUWZXA .

Naoto Inoue · Answer 19 · Thu Jun 11 2020 12:31:27 GMT+0800 (China Standard Time)

I have a question about reproducing the light probe estimation experiment.
Could you elaborate a bit more on experimental details?

I just guessed some parameters below and trained the model, but it seems to be not working (almost giving an average image).
input/target: LDR RGB image (the range of values is between -0.5~0.5), lr=1.0 * 10^{-4}, optimizer=Adam, iteration=2.0 * 10^{5}, random crop (512x512), loss: MSE(L2)

In Sec 4.2.1, you mentioned Like in Sec 4.1, we work in the log-domain to limit the dynamic range of the network’s internal activations, but there seems to be no description in Sec4.1

From left to right: input, output (no auto exposure), target

Lukas Murmann · Answer 20 · Fri Jun 19 2020 03:36:27 GMT+0800 (China Standard Time)

Hi Naoto, Your configuration generally looks good. You should be able to train a working version of the illumination prediction without converting to log domain.

In your setup, if the parameters that you mentioned don't work, I would try SGD with 1e-3 step size and L1 loss. Generally we were able to train models for a variety of hyper parameters and convergence should be better than the screenshots that you posted. Hyper parameter tuning for us mostly improved the sharpness of the predictions predictions, but the general direction of the illumination should be correct for most training runs.

hustliujian · Answer 21 · Tue Sep 22 2020 16:31:38 GMT+0800 (China Standard Time)

@lmurmann Hello, I have trained the model of single illumination estimation.The L2Loss curve(finetune) is as follows,

It is about 0.01459 in training dataset and 0.01897 in test dataset.
The test images are as follows:
1.Input

2.Groundtruth

3.Pred_result

Could you tell me your test results?
And I have another question.Whether is the trained model used to render object in video? Whether it is stable or not will not sway？Thank you very much!

The L2Loss curve almost didn't decrease as you show, so is this result normal ?
And I compute L2Loss on testset with weight provided by the author, the value is about 0.9, but your L2Loss is so small, so I want to know is there any difference between yours and the author's ?