fcdl94 / MiB

Official code for Modeling the Background for Incremental Learning in Semantic Segmentation https://arxiv.org/abs/2002.00718

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The division of the test data set.

ljx97 opened this issue · comments

commented

Hello,
Thanks for your interesting work. I am executing your method using other data sets. But in this process, I encountered the problem of data set partition. Should all test data be used for each task of incremental learning? How to devide the test data set? Is it possible to split test data set using ade-split.ipynb?

Hi @ljx97!

Well, it depends. In our case, we only evaluated the final step, so we were not interested in each intermediate result.
In other applications, you may divide it depending on the classes, masking all the classes that have not been seen.

Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.

commented

Sorry for replying you late.

I want to run your code on the new dataset. Would u please tell me how to split dataset? Did you try to split a new dataset in the way of disjoint or overlapped? I saw your answers in the closed issues that the code named ADE-Split.ipynb is not meet disjoint and overlap. I want to know whether you have any methods or codes to split the dataset.

Thanks a lot.

I think the best option is to adapt the ADE-Split.ipynb if you have a large enough dataset.
Otherwise, go for the overlap option. A template code to split the dataset is the following:

if __name__ == "__main__":
    CLASSES = number_of_classes
    data = your_dataset # should return image/labels as tensors
    overlap = True/False 
    n_steps = number_of_steps
    task_dict = {
     0: [list of classes in step 0],
     1:  [list of classes in step 1],
     ... 
     n_step-1:  [list of classes in step n_step-1]
    }
    
    labels_old = []
    for step in range(0, n_steps):
        labels = task_dict[step]
        if step > 0:
            labels_old = labels_old +  task_dict[step-1]
        labels_cum = labels_old + labels
        indices = np.arange(len(data))

        for i in tqdm(range(len(data))):
            one_hot = data[i][1].unique()
            # print(f"{i} : {one_hot.nonzero()[0] + 1}")
            # remove ignore label 
            one_hot = one_hot[one_hot != 255]

            if not overlap:
                # Disjoint: Exclude if it does not contain any label in label or it contains future labels
                if not all(x in labels_cum for x in one_hot.nonzero()[0]+1) or not any(x in labels for x in one_hot.nonzero()[0]+1):
                    # print(f"Exclude {i}")
                    indices[i] = -1
            else:
                # Overlap: Exclude only if it does not contain any label in labels
                if not any(x in labels for x in one_hot.nonzero()[0]+1):
                    # print(f"Exclude {i}")
                    indices[i] = -1

        indices = indices[indices != -1]
        print(len(indices))

        np.save(f"split.npy", indices)

Hey,

If I may hicjack this issue:

Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.

In my paper PLOP, I modified @fcdl94 code to do evaluation after each step, on only the test images with at least one pixel of seen classes. It both support disjoint and overlap.

https://github.com/arthurdouillard/CVPR2021_PLOP

commented

Hi, @fcdl94. Thank you for your reply!Your code is useful and I gained a lot.

Hi, @fcdl94. Thank you for your reply!Your code is useful and I gained a lot.

Can you provide me with the script to divide my own dataset, please?Thank you!