Changing input layer from DenseImageData to HDF5 for the same training set gives different and incorrect segmentation results
bparaj opened this issue · comments
My current dataset has RGB images with two classes to segment. I used the given CamVid prototxt files. When I use DenseImageData layer as the input layer, the training proceeds fine and starts showing improvements in the per class accuracy after about 600 iterations. The segmentation results on the test examples are also good and make sense when verified visually.
I changed the input layer to accommodate HDF5 files as input as follows:
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "dataset/train_data.txt"
batch_size: 4
}
}
dataset/train_data.txt
holds a bunch of paths to HDF5 files which contain the same training images as before. I made sure each HDF5 file is less than 2 GB in size. I used the following script to create the HDF5 files.
import cv2
import numpy as np
import h5py
def dump_feature_images_as_hdf5(img_ids, images, masks, out_h5_fname):
"""
Given two parallel lists containing paths to RGB images and their corresponding masks,
create a new hdf5 file named out_h5_fname and dump the images and masks.
img_ids is a list of strings which identify the images.
"""
nrow, ncol, nchl = 512, 512, 3
num_per_h5 = len(img_ids)
# Create hdf5 file with datasets to hold image channels and corresponding masks.
h5_ftr = h5py.File(out_h5_fname, "w")
h5_ftr.create_dataset("data", (num_per_h5, nchl, nrow, ncol))
h5_ftr.create_dataset("label", (num_per_h5, 1, nrow, ncol))
h5_ftr.create_dataset("img_id", (num_per_h5,), dtype="S11")
for i, (img_id, img, mask) in enumerate(zip(img_ids, images, masks)):
# Tensor that holds the RGB channels
tensor = cv2.imread(img, cv2.IMREAD_COLOR)
tensor = np.swapaxes(tensor, 0, 2)
assert tensor.shape == (3, 512, 512)
# Read mask
msk = cv2.imread(mask, cv2.IMREAD_GRAYSCALE)
msk = msk.reshape(1, 512, 512)
assert msk.shape == (1, 512, 512)
h5_ftr["data"][i] = tensor
h5_ftr["label"][i] = msk
h5_ftr["img_id"][i] = np.string_(img_id)
h5_ftr.close()
With this set up for training, the per class accuracy starts showing improvements only after 9000 iterations. When testing with a model trained for 40000 iterations which showed training per class accuracies as > 0.95 for both the classes, the segmentation results are extremely poor.
My bad! It was because of the way I was storing the masks in the hdf5 file. The following change fixed the issue:
# Read mask
mask = cv2.imread(mask, cv2.IMREAD_GRAYSCALE)
mask = np.expand_dims(mask, axis=3)
mask = mask.swapaxes(0, 2)