alexgkendall / caffe-segnet

Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Home Page:http://mi.eng.cam.ac.uk/projects/segnet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Changing input layer from DenseImageData to HDF5 for the same training set gives different and incorrect segmentation results

bparaj opened this issue · comments

My current dataset has RGB images with two classes to segment. I used the given CamVid prototxt files. When I use DenseImageData layer as the input layer, the training proceeds fine and starts showing improvements in the per class accuracy after about 600 iterations. The segmentation results on the test examples are also good and make sense when verified visually.

I changed the input layer to accommodate HDF5 files as input as follows:

layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "dataset/train_data.txt"
    batch_size: 4 
  }
}

dataset/train_data.txt holds a bunch of paths to HDF5 files which contain the same training images as before. I made sure each HDF5 file is less than 2 GB in size. I used the following script to create the HDF5 files.

import cv2
import numpy as np
import h5py


def dump_feature_images_as_hdf5(img_ids, images, masks, out_h5_fname):
    """
    Given two parallel lists containing paths to RGB images and their corresponding masks,
    create a new hdf5 file named out_h5_fname and dump the images and masks. 

    img_ids is a list of strings which identify the images.
    """
    nrow, ncol, nchl = 512, 512, 3
    num_per_h5 = len(img_ids)

    # Create hdf5 file with datasets to hold image channels and corresponding masks.
    h5_ftr = h5py.File(out_h5_fname, "w")
    h5_ftr.create_dataset("data", (num_per_h5, nchl, nrow, ncol))
    h5_ftr.create_dataset("label", (num_per_h5, 1, nrow, ncol))
    h5_ftr.create_dataset("img_id", (num_per_h5,), dtype="S11")

    for i, (img_id, img, mask) in enumerate(zip(img_ids, images, masks)):
        # Tensor that holds the RGB channels
        tensor = cv2.imread(img, cv2.IMREAD_COLOR)
        tensor = np.swapaxes(tensor, 0, 2)
        assert tensor.shape == (3, 512, 512)

        # Read mask
        msk = cv2.imread(mask, cv2.IMREAD_GRAYSCALE)
        msk = msk.reshape(1, 512, 512)
        assert msk.shape == (1, 512, 512)

        h5_ftr["data"][i] = tensor
        h5_ftr["label"][i] = msk
        h5_ftr["img_id"][i] = np.string_(img_id)

    h5_ftr.close()

With this set up for training, the per class accuracy starts showing improvements only after 9000 iterations. When testing with a model trained for 40000 iterations which showed training per class accuracies as > 0.95 for both the classes, the segmentation results are extremely poor.

My bad! It was because of the way I was storing the masks in the hdf5 file. The following change fixed the issue:

        # Read mask
        mask = cv2.imread(mask, cv2.IMREAD_GRAYSCALE)
        mask = np.expand_dims(mask, axis=3)
        mask = mask.swapaxes(0, 2)