Questions about your training code / preprocessing

Question

Questions about your training code / preprocessing

yshhrknmr opened this issue 4 years ago · comments

Yoshihiro Kanamori commented 4 years ago

Hi, this is amazing work. One of my students is trying to run your training code but struggling now. I'm pleased if you could kindly answer the following questions.

In "FSRN-CNN-train.py," the data for training and testing are loaded as single .npy files, i.e.,
X_train = np.load(dir_references+"X_train{}.npy".format(TRAIN_NUM))
X_train = np.array(X_train)
but the datasets are provided as separate files stored in different directories. What kind of preprocessing did you employ? For example, merely concatenating all the data into a single .npy file? In that case, I wonder if the GPU memory can hold huge data and what GPU you used.

My student assumed that the data shapes for training and validation are as follows (where N is the number of data). Is this understanding correct?

X_train: (N, 128, 128, 6) consisting of "render image (3ch)" and "reference image (3ch)"
y_train: (N, 128, 128, 11) consisting of "depth map", "normal map (3ch)", "render image (3ch)", "reference image (3ch)," and "scale" ("render image" and "reference image (3ch)" are the same as those of X_train)

If so, "scale" originally has a shape of (128,128,1) but you access the data via slices like [:,0:1,0:1] and [:,0:1,1:2]. How did you modify the shape of "scale"?

My student also wondered about the difference between the network's default input resolution (128x128) and those in the dataset (256x256). With which resolution did you train your network?