Few shot learning for Document AI
SimJeg opened this issue · comments
Hello,
I am working on a practical use-case of Document understanding and wondering if I could leverage models such as StructuralLM. The goal is to extract key informations from the document (in fields or tables). The trick is that I only have a few training samples (<50) and I don't think VQA would apply as these informations are very specific and not always associated with a clear question.
Here are the 2 options I have in mind :
- finetuning model. But would 50 sample be enough ? How should I deal with tables ? (which don't really look like tables but rather a list without printed rows and columns, as on many receipts)
- leverage a foundation model to perform few shot learning (as in GPT3). Are there text + layout foundation models out there that would work for this ? Or should I do prompt engineering with GPT3, Flan-T5, OPT or equivalent models ?
I am interested to get your insights for both english-data... and non english (but latin) data,
Many thanks for your inputs,
Simon
Thank you for your attention. I suggest that you first use the model of public data training, such as DocVQA, to fine-tune your few training samples.