dougsm / ggcnn

Generative Grasping CNN from "Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach" (RSS 2018)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About Depth Input processing in paper

kendj-staff opened this issue · comments

Dear Doug,
when I read the paper, I have some questions about the input- depth image's process. I have search on the internet, but I could not find an understandable answer.
I am a newbie learner on deep learning and robot, the questions I have maybe is a little silly, hope you do not mind it.
Could you help me to answer these questions? thanks a lot!!

Background: As you mentioned in your paper, you subtract the mean of each depth image, center its value around 0 to provide depth invariance.
Question 1: I have a little confused about why we need to provide the depth invariance?
Question 2: what situation would happen if we don't do this step(provide depth invariance)?
Question3: In run_ggcnn.py file, line 111 , we calculate the depth by use the code below
depth_center = depth_center[:10].mean() * 1000.0
my question is why we set the scale to be 1000.0 ? which factors decided the scale's setting, camera or anything else? if I want to run this code on my robot and camera, should I change this scale?

Thanks a lotttt for ur answer!!!

Hi @kendj-staff , I hope I can answer your questions:

1/2. This is mostly for the beneift of the neural network. For training and inference, it is important that the value is close to zero, and in most tasks you would normalise data by subtracting the mean and dividing by the standard deviation. Here we subtract the mean to zero centre the data, so that it doesn't matter the distance from the image to the object. Otherwise, the same image from 1m away would give very different output to the same image from 1.5m away, for example.
3. I believe that is just scaling from metres to millimetres.

oh !! I got it!! your answer is really clearly for me!!! thank you for your patiently explanation again!!!