clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Home Page:https://arxiv.org/abs/2111.15664

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset Loader didn't work properly on Kaggle

wdprsto opened this issue · comments

Good afternoon,

This morning I was trying to run Donut on Kaggle. The structure of the dataset is similar with the one defined on the documentation. However, when I am trying train the model, an error occurred, saying that the "ground truth" didn't exist. While checking on the sample, it shows that the load_dataset recognize the folder name as label and ignore the metadata.jsonl file inside the folder.
image

I can read the jsonl file via command, tho.
image

I prepare the Donut with this code:

!git clone https://github.com/clovaai/donut.git

!cd donut && pip install .

Thank you for your help

Solved by installing datasets ver 2.4
ref