download the CUB dataset, but there is no train.json in it.

Question

download the CUB dataset, but there is no train.json in it.

michaelwithu opened this issue 4 months ago · comments

Download CUB dataset by link ：https://data.caltech.edu/records/65de6-vp158
but there is no train.json.

then,links(https://cornell.box.com/v/vptfgvcsplits; https://drive.google.com/drive/folders/1mnvxTkYxmOr2W9QjcgS64UBpoJ4UmKaM?usp=sharing) do not work.

WuLin Xie · Answer 1 · Mon Jul 29 2024 03:56:09 GMT+0800 (China Standard Time)

I have same issue like you

Arpita Chowdhury · Answer 2 · Wed Aug 21 2024 14:29:24 GMT+0800 (China Standard Time)

It should look like this:

{
   image_name : class_index
}

You can follow the below code to generate the json. if you take the whole train set as training data and val as both test and val data. Feel free to randomly take 20% of train data in train.json and 20% in val.json if needed. I didn't cause for cub I don't need to find proper hyperparameters, they did it already.
assuming your dataset for cub looks like this:

cub
- train
  - 1_className
    - image_1
    - image_2
  - 2_className
    - image_1
    - ...
- val
  - 1_className
    - image_1

import os
import json

def create_json_files(data_dir):
    json_data = {'train': {}, 'val': {}, 'test': {}}

    for split in ['train', 'val']:
        split_dir = os.path.join(data_dir, split)
        for class_name in os.listdir(split_dir):
            class_dir = os.path.join(split_dir, class_name)
            if os.path.isdir(class_dir):
                class_id = int(class_name.split(".")[0]) 
                for img_name in os.listdir(class_dir):
                    img_path = os.path.join(split, class_name, img_name) 
                    json_data[split][img_path] = class_id

    # Create the JSON files
    for split in ['train', 'val']:
        json_file_path = os.path.join(data_dir, f'{split}.json')
        with open(json_file_path, 'w') as f:
            json.dump(json_data[split], f, indent=4)
    
    # For the test set, we'll assume it uses the same format as val
    json_data['test'] = json_data['val']
    test_file_path = os.path.join(data_dir, 'test.json')
    with open(test_file_path, 'w') as f:
        json.dump(json_data['test'], f, indent=4)

    return json_data

dataset_path = "<path_to _your_dataset>"
create_json_files(dataset_path)