QuickDraw dataset not present

Question

QuickDraw dataset not present

sawant-nidhish opened this issue 3 years ago · comments

Dear Sir.

Firslty, thank you for your amazing research. There are some problems which I am facing. After going through your code I found out that there are just 1035 files (sketches) in picture_files (which is downloaded by running the dataloader/download.sh . In the paper it is mentioned that the models are trained on the Quicdraw 3.8M dataset. Since, the dataset is missing the train and test split, which happens in the dataloader_image.py, is not happening properly.
`class SketchDataset(data.Dataset):

def __init__(self, root, split, resize):
    
    self.split = split.lower()             
    assert(self.split=='train' or self.split=='test')
    if resize:
        transforms_list = [
        transforms.Resize(299),
        lambda x: np.asarray(x),
        ]
    else:
        transforms_list = [
        lambda x: np.asarray(x),
        ]  
    
    _NUM_VALIDATION = 345000
    _RANDOM_SEED = 0  
    photo_filenames, _ = _get_filenames_and_classes(root)
    random.seed(_RANDOM_SEED)
    random.shuffle(photo_filenames)
    if self.split == "train":
        self.image_list = photo_filenames[_NUM_VALIDATION:]            
    elif self.split == "test":
        self.image_list = photo_filenames[:_NUM_VALIDATION]
    self.transform = transforms.Compose(transforms_list)         
    

def __getitem__(self, index):
    image = Image.open(self.image_list[index])
    image = self.transform(image)
    return image

def __len__(self):
    return len(self.image_list)`

Here the _NUM_VALIDATION variable is set to 345000 but the dataset consists only of 1035 image (sketches). Hemce, the train set is allotted 0 images where as the test set gets 1035. I have printed the number of images by adding the print("DATASET SIZE:",i) in dataloader.py

def_get_filenames_and_classes(dataset_dir):
    quickdraw_root = dataset_dir
    directories = []
    class_names = []
    for filename in os.listdir(quickdraw_root):
        path = os.path.join(quickdraw_root, filename)
        if os.path.isdir(path):
            directories.append(path)
            class_names.append(filename)
	i=0
    photo_filenames = []
    for directory in directories:
        for filename in os.listdir(directory):
            i=i+1
            path = os.path.join(directory, filename)
            photo_filenames.append(path)
	print("DATASET SIZE:",i)
    return photo_filenames, sorted(class_names)

Due to this discrepancy the error raise ValueError("num_samples should be a positive integer " ValueError: num_samples should be a positive integer value, but got num_samples=0 comes up.

Please look into this matter. I hope to hear from you soon.

Once again thank you for your work. This research will help me solve a lot of problems.