SOURCE_DIR clarification

Question

SOURCE_DIR clarification

justcho5 opened this issue 5 years ago · comments

SOURCE_DIR: Directory where the discovery images (refer to the paper) are saved.
It should contain (at least) num_random_exp + 2 folders:
1-"target_class" which contains images of the class to be explained.
2-"random_discovery" which contains randomly selected images of the same dataset (at lease $max_imgs number of images).
3-"random500_0, ..., random_500_${num_random_exp} where each one contains 500 randomly selected images from the data set"

So I have a dataset with images belonging to either class A or B. I want to explain class A. The target_class directory should contain class A images. random_discovery should contain random images from the dataset which can be either class A or B. and random500_x directories should contains images from the dataset, which can be either class A or B. All the images for each of these folders come from the same dataset. Is that correct?

internetboy · Answer 1 · Mon Nov 04 2019 09:52:58 GMT+0800 (China Standard Time)

That is correct. The only thing is that the name of the folder "target_class" should be the same as the name of the class provided for --target_class argument. (random500_x folders and random_discovery folders should contain images randomly sampled from the dataset independent of the image labels)

Justina Cho · Answer 2 · Fri Nov 08 2019 00:53:54 GMT+0800 (China Standard Time)

If the random folders contain images that are randomly sampled from the dataset, would unbalanced classes create any problems? My model learns two classes and the training data for one has significantly larger than the other. Should I randomly sample from a subset of the dataset, such that the two classes are equal?