The division of the test data set.
ljx97 opened this issue · comments
Hello,
Thanks for your interesting work. I am executing your method using other data sets. But in this process, I encountered the problem of data set partition. Should all test data be used for each task of incremental learning? How to devide the test data set? Is it possible to split test data set using ade-split.ipynb?
Hi @ljx97!
Well, it depends. In our case, we only evaluated the final step, so we were not interested in each intermediate result.
In other applications, you may divide it depending on the classes, masking all the classes that have not been seen.
Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.
Sorry for replying you late.
I want to run your code on the new dataset. Would u please tell me how to split dataset? Did you try to split a new dataset in the way of disjoint or overlapped? I saw your answers in the closed issues that the code named ADE-Split.ipynb is not meet disjoint and overlap. I want to know whether you have any methods or codes to split the dataset.
Thanks a lot.
I think the best option is to adapt the ADE-Split.ipynb if you have a large enough dataset.
Otherwise, go for the overlap option. A template code to split the dataset is the following:
if __name__ == "__main__":
CLASSES = number_of_classes
data = your_dataset # should return image/labels as tensors
overlap = True/False
n_steps = number_of_steps
task_dict = {
0: [list of classes in step 0],
1: [list of classes in step 1],
...
n_step-1: [list of classes in step n_step-1]
}
labels_old = []
for step in range(0, n_steps):
labels = task_dict[step]
if step > 0:
labels_old = labels_old + task_dict[step-1]
labels_cum = labels_old + labels
indices = np.arange(len(data))
for i in tqdm(range(len(data))):
one_hot = data[i][1].unique()
# print(f"{i} : {one_hot.nonzero()[0] + 1}")
# remove ignore label
one_hot = one_hot[one_hot != 255]
if not overlap:
# Disjoint: Exclude if it does not contain any label in label or it contains future labels
if not all(x in labels_cum for x in one_hot.nonzero()[0]+1) or not any(x in labels for x in one_hot.nonzero()[0]+1):
# print(f"Exclude {i}")
indices[i] = -1
else:
# Overlap: Exclude only if it does not contain any label in labels
if not any(x in labels for x in one_hot.nonzero()[0]+1):
# print(f"Exclude {i}")
indices[i] = -1
indices = indices[indices != -1]
print(len(indices))
np.save(f"split.npy", indices)
Hey,
If I may hicjack this issue:
Practically, not seen classes are set as 255 in the test evaluation using my code. If you want to split the dataset, I think it is better to follow the overlapped scenario, so it's better to keep all the test images with at least one pixel of a seen class. Alas, the code for doing this in not provided in my implementation.
In my paper PLOP, I modified @fcdl94 code to do evaluation after each step, on only the test images with at least one pixel of seen classes. It both support disjoint and overlap.