alibaba / AliceMind

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Few shot learning for Document AI

SimJeg opened this issue · comments

Hello,

I am working on a practical use-case of Document understanding and wondering if I could leverage models such as StructuralLM. The goal is to extract key informations from the document (in fields or tables). The trick is that I only have a few training samples (<50) and I don't think VQA would apply as these informations are very specific and not always associated with a clear question.

Here are the 2 options I have in mind :

  • finetuning model. But would 50 sample be enough ? How should I deal with tables ? (which don't really look like tables but rather a list without printed rows and columns, as on many receipts)
  • leverage a foundation model to perform few shot learning (as in GPT3). Are there text + layout foundation models out there that would work for this ? Or should I do prompt engineering with GPT3, Flan-T5, OPT or equivalent models ?
    I am interested to get your insights for both english-data... and non english (but latin) data,

Many thanks for your inputs,
Simon

Thank you for your attention. I suggest that you first use the model of public data training, such as DocVQA, to fine-tune your few training samples.