xu-ji / IIC

Invariant Information Clustering for Unsupervised Image Classification and Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to run this code on custom data

shadyatscu opened this issue · comments

Thanks for your great work! I check the code and find its hard coded for the benchmark given in the paper, whether could i run on my custom data? thanks a lot!

commented

Hi, the code has optional settings that you may not want or need. Each setting is easy to turn on or off using the input arguments to the scripts.

To run on your own dataset, create your own script. For example, create a copy of cluster_sobel_twohead.py inside code/scripts/cluster and replace the call of cluster_twohead_create_dataloaders in line 171 with a call to your own function, which would return 4 dataloaders:

  • dataloaders_head_B (for training main head): a list of config.num_dataloaders + 1 dataloaders each containing the main training data for the model. Every PyTorch dataloader is a wrapper for a PyTorch dataset. You can implement your own dataset class with the PyTorch interface. As you can see in the training script, images x are sampled from dataloaders_head_B[0] and images gx are sampled from each of the others, dataloaders_head_B[1..config.num_dataloaders]. When you create the datasets for the latter, if you are simply using a PyTorch dataset class that already exists, set the transform argument to your invariance transforms. If you are using your own dataset class, it should also be able to take a transform, which is executed when yielding its images. config.num_dataloaders corresponds to the number of sample repeats (r in experiment section of paper). So if you don't want extra sample repeats, you would just have one gx, and the length of dataloaders_head_B would be 2.
  • dataloaders_head_A (for training auxiliary head): like dataloaders_head_A but the dataset wrapped by each dataloader contains all training data for the model, i.e. including noisy or irrelevant images. Is the same as dataloaders_head_B if you have no extra data. Since training alternates between the main and auxiliary heads, dataloaders_head_B and dataloaders_head_A are alternately used.
  • mapping_assignment_dataloader: used for evaluation, i.e. similar to dataloaders_head_B[0]. This is used to find the 1-1 mapping between output clusters and ground truth classes.
  • mapping_test_dataloader: used for evaluation. After the 1-1 mapping is found using mapping_assignment_dataloader, it is used to assess the data contained in mapping_test_dataloader. For the fully unsupervised mode, images used to train the main head and images used to test it are the same (except in test we need to look at the labels to get performance accuracy), so mapping_assignment_dataloader = mapping_test_dataloader. (It's not for semi-supervised overclustering setting, where training and test sets are separate.)

In summary, you would only need to change that one function call to cluster_twohead_create_dataloaders. (It should still obey the input settings in config for number of dataloaders, input size, transforms etc - which you can of course also change or remove if redundant.)

@xu-ji For segmentation, I found for Postdam in line in data.py, it requires "unlabelled_train", "labelled_train", "labelled_test", but in the paper it said it's unsupervised method, this is confused to me, could you explain it? Also, I don't have the labeled training images, how can I generate the image pairs for training and test dataset on my customize segmentation dataset? Thanks.

commented

Images are taken from labelled_train because otherwise you would be not using most of the dataset. Labels are only used for evaluation to find the 1-1 mapping between output clusters and ground truth clusters, not for training.

If you don't have labels, you will train the network but won't be able to quantitatively evaluate it. This means you would be skipping the call to eval.

You don't need labels to generate pairs. You take your whole image, copy it and transform it to create the second image of the pair. The transforms used for Potsdam were jitter and flipping, done here.

Thanks. It's clear to me now. As for the training and test images, for the "unlabelled_train", "labelled_train", "labelled_test", what is the ratio? For example, I have 4500 "unlabelled_train" images, and the "labelled_train" is the same with "labelled_test", but only have 500 images, is it ok? Also, did you try depth images only for segmentation?

In line, for depth image only, do I need to using config.in_channels = 1 + 2 # depth + sobel, using_IR=False or config.in_channels = 1 # depth only, using_IR=False?Thanks.

commented

If you look at the supplementary material (in /paper) table 6 gives the dataset sizes. Unlabelled train + labelled train = 8550 images, labelled test = 5400 images (so labelled train = 5400, unlabelled train = 8550 - 5400). You should be fine with 500 labelled images. The amount of labels required to find the mapping is very low.

We did not try anything with depth.

Your input channels would be 1. You would not use sobel filtering at all, that is a transform for colour images. Because you are working with depth, you may want to consider different transforms to what we used. Jitter is also an operation intended for colour images. You may want to consider trying salt and pepper noise, and flipping (depending on what your images are about), as your transforms.

Thanks. It's really helpful.

@xu-ji Can your IIC method do instance segmentation?

commented

No. That would require some material addition to the method.

what about model ind for our custom dataset?
could you please describe model ind and --arch
is it possible to change input_sz like 256 for our data or not? for these models?

commented

--model_ind is just a name for the experiment, to create the directory to store the results in. Can be anything.

--arch is used to select network architecture. E.g. here.

You could run the scripts on images of 256x256. If you are using your own dataset you will almost certainly need to change the code anyway, if you need to write your own dataloader. To use our existing architectures, the easiest way is just to resize your images to one of the compatible sizes. For example for segmentation our Potsdam images were 200x200. You can find these details by looking at the code or in the supplementary material.

Hello, your work is very useful for me .Thanks a lot!
could you please tell me how to check the picture like splash.png after running segmentation _twohead.py? I'm trying to use my own dataset and check the result of picture. Do I have to add some codes?

commented

If it's your own dataset, probably best to write your own script. It's quite simple, just load your saved network, run your data through the network, get prediction per pixel, map each prediction to a colour.

There are some examples that I used at one point, the render*.py scripts in this dir, where I do exactly this. I use the PIL Image library or matplotlib to turn numpy arrays into images and save to file.

Thanks a lot for the awesome and useful work!

I was wondering about couple of things:

  1. Are the main and auxiliary overclustering heads independent of each other? Say in a fully unsupervised segmentation task where I would want to use the best model from your experiments, which has 2 heads (head_A and head_B), could I just drop the overclustering head head_Afrom the model and use the resulting net with only head head_B as a pretrained net?
  2. Given that the number of ground truth classes in my scenario would be different (say 10 instead of 6), could I simply reinitialize the Conv2d in the IIC head_B (i.e. the main head) from
(head_B): SegmentationNet10aHead(
    (heads): ModuleList(
      (0): Sequential(
        (0): Conv2d(512, 6, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1), bias=False)
        (1): Softmax2d()
      )
    )
)

to

(head_B): SegmentationNet10aHead(
    (heads): ModuleList(
      (0): Sequential(
        (0): Conv2d(512, 10, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1), bias=False)
        (1): Softmax2d()
      )
    )
)

similarly to what is shown here for Sqeezenet?

commented
  1. Yes, inference on head B does not need head A in that the outputs are separately interpretable. (Though they share the same trunk, so not independent.) But which head is actually better for your downstream task may need testing to be known.

  2. If you replace head B in a trained network with any randomly initialized new head, it'll need further training for its outputs to be meaningful, either with the IIC loss or some other relevant objective. But yes you could do this.