The division of the test data set.

Question

The division of the test data set.

ljx97 opened this issue 3 years ago · comments

Hello,
Thanks for your interesting work. I am executing your method using other data sets. But in this process, I encountered the problem of data set partition. Should all test data be used for each task of incremental learning? How to devide the test data set? Is it possible to split test data set using ade-split.ipynb?

Fabio Cermelli · Answer 1 · Thu Dec 09 2021 23:43:38 GMT+0800 (China Standard Time)

Hi @ljx97!

Well, it depends. In our case, we only evaluated the final step, so we were not interested in each intermediate result.
In other applications, you may divide it depending on the classes, masking all the classes that have not been seen.

Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.

abcd · Answer 2 · Mon Jan 10 2022 15:57:48 GMT+0800 (China Standard Time)

Sorry for replying you late.

I want to run your code on the new dataset. Would u please tell me how to split dataset? Did you try to split a new dataset in the way of disjoint or overlapped? I saw your answers in the closed issues that the code named ADE-Split.ipynb is not meet disjoint and overlap. I want to know whether you have any methods or codes to split the dataset.

Thanks a lot.

Fabio Cermelli · Answer 3 · Wed Jan 12 2022 20:01:50 GMT+0800 (China Standard Time)

I think the best option is to adapt the ADE-Split.ipynb if you have a large enough dataset.
Otherwise, go for the overlap option. A template code to split the dataset is the following:

if __name__ == "__main__":
    CLASSES = number_of_classes
    data = your_dataset # should return image/labels as tensors
    overlap = True/False 
    n_steps = number_of_steps
    task_dict = {
     0: [list of classes in step 0],
     1:  [list of classes in step 1],
     ... 
     n_step-1:  [list of classes in step n_step-1]
    }
    
    labels_old = []
    for step in range(0, n_steps):
        labels = task_dict[step]
        if step > 0:
            labels_old = labels_old +  task_dict[step-1]
        labels_cum = labels_old + labels
        indices = np.arange(len(data))

        for i in tqdm(range(len(data))):
            one_hot = data[i][1].unique()
            # print(f"{i} : {one_hot.nonzero()[0] + 1}")
            # remove ignore label 
            one_hot = one_hot[one_hot != 255]

            if not overlap:
                # Disjoint: Exclude if it does not contain any label in label or it contains future labels
                if not all(x in labels_cum for x in one_hot.nonzero()[0]+1) or not any(x in labels for x in one_hot.nonzero()[0]+1):
                    # print(f"Exclude {i}")
                    indices[i] = -1
            else:
                # Overlap: Exclude only if it does not contain any label in labels
                if not any(x in labels for x in one_hot.nonzero()[0]+1):
                    # print(f"Exclude {i}")
                    indices[i] = -1

        indices = indices[indices != -1]
        print(len(indices))

        np.save(f"split.npy", indices)

Arthur Douillard · Answer 4 · Tue Jan 18 2022 01:04:55 GMT+0800 (China Standard Time)

Hey,

If I may hicjack this issue:

Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.

In my paper PLOP, I modified @fcdl94 code to do evaluation after each step, on only the test images with at least one pixel of seen classes. It both support disjoint and overlap.

https://github.com/arthurdouillard/CVPR2021_PLOP

abcd · Answer 5 · Thu Feb 24 2022 11:05:53 GMT+0800 (China Standard Time)

Hi, @fcdl94. Thank you for your reply！Your code is useful and I gained a lot.

JFJ-Bin · Answer 6 · Thu Feb 29 2024 19:48:05 GMT+0800 (China Standard Time)

Hi, @fcdl94. Thank you for your reply！Your code is useful and I gained a lot.

Can you provide me with the script to divide my own dataset, please?Thank you！