dataset script missing error
segaranp opened this issue · comments
Hi,
I'm using this project with my own custom dataset. I created a sample data in the dataset folder as specified in the README.md with a folder for (test/validation/train) with the metadata
Then i ran this command:
python train.py --config config/train_cord.yaml --pretrained_model_name_or_path "naver-clova-ix/donut-base" --dataset_name_or_paths 'C:\ocr\2\donut\dataset' --exp_version "test_experiment"
But i'm getting this error:
File "C:\ocr\2\donut\train.py", line 176, in <module>
train(config)
File "C:\ocr\2\donut\train.py", line 104, in train
DonutDataset(
File "C:\ocr\2\donut\donut\util.py", line 64, in __init__
self.dataset = load_dataset(dataset_name_or_path, split=self.split)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thaba\miniconda3\Lib\site-packages\datasets\load.py", line 2129, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thaba\miniconda3\Lib\site-packages\datasets\load.py", line 1815, in load_dataset_builder
dataset_module = dataset_module_factory(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thaba\miniconda3\Lib\site-packages\datasets\load.py", line 1508, in dataset_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at C:\ocr\2\donut\C\C.py or any data file in the same directory. Couldn't find 'C' on the Hugging Face Hub either: FileNotFoundError: Dataset 'C' doesn't exist on the Hub. If the repo is private or gated, make sure to log in with `huggingface-cli login`.
Anyone know how to resolve this?
This will help: https://huggingface.co/docs/datasets/image_load
The easiest way is to ensure you have the following directory/file structure in your dataset folder:
configs:
- config_name: default
data_files: - split: train
path: "train" - split: test
path: "test" - split: validation
path: "validation"
- a README.md that includes the above.