Few shot learning for Document AI

Question

Few shot learning for Document AI

SimJeg opened this issue 2 years ago · comments

Hello,

I am working on a practical use-case of Document understanding and wondering if I could leverage models such as StructuralLM. The goal is to extract key informations from the document (in fields or tables). The trick is that I only have a few training samples (<50) and I don't think VQA would apply as these informations are very specific and not always associated with a clear question.

Here are the 2 options I have in mind :

finetuning model. But would 50 sample be enough ? How should I deal with tables ? (which don't really look like tables but rather a list without printed rows and columns, as on many receipts)
leverage a foundation model to perform few shot learning (as in GPT3). Are there text + layout foundation models out there that would work for this ? Or should I do prompt engineering with GPT3, Flan-T5, OPT or equivalent models ?
I am interested to get your insights for both english-data... and non english (but latin) data,

Many thanks for your inputs,
Simon

Li Chenliang · Answer 1 · Thu Jan 12 2023 14:48:07 GMT+0800 (China Standard Time)

Thank you for your attention. I suggest that you first use the model of public data training, such as DocVQA, to fine-tune your few training samples.