LayoutLMv3 has to depend on Detectron2?
7fantasysz opened this issue · comments
I am using LayoutLMv3.
If I am not interested in the layout/object detection task, but only form recogniser and document classification tasks, could I be spared for the Detectron2 installation? It's hard to install on a VM without direct public internet connection and this installation will add unnecessary burden to our pipeline that's running everyday.
Hi, thanks for your question!
The current version of the unilm/layoutlmv3 implementation uses Detectron2 in the following two aspects:
- To load images in datasets (e.g., FUNSD, CORD).
You can avoid installing Detectron2 (reference) by modifying the following codes
unilm/layoutlmv3/layoutlmft/data/image_utils.py
Lines 9 to 10 in ca82fd4
unilm/layoutlmv3/layoutlmft/data/image_utils.py
Lines 21 to 27 in ca82fd4
tofrom PIL import Image def load_image(image_path): image = Image.open(image_path).convert("RGB") w, h = image.size return image, (w, h)
- To support detection tasks.
The current version of the unilm/layoutlmv3 implementation has setdetection=False
, which does not use detection components. Removing all codes related todetection
inmodeling_layoutlmv3.py
will also work. For example, @NielsRogge has removed is_detection logic in this PR.
Your answer is really helpful! That's what I was looking for. Two follow-up questions:
-
After removal of this dependency, would it affect the accuracy of different tasks (form/receipt understanding, image classification, DocVQA)? If it is, do you have a metrics about how much accuracy will differ from the one on paper?
-
By reading the paper, my understanding is that adding Detectron2 is only to finetune and compare with other models on PubLayNet dataset, not a fundamental part of this layoutlmv3 for other tasks, right?
Thanks for your help!
I'm glad it helped.
- The two snippet of codes should be equivalent, so switching from one to the other will not affect accuracy. I haven't verified this conclusion experimentally, but @NielsRogge's experimental results (e.g., FUNSD) support this conclusion.
- You are right.
That's great to know. Also appreciate your insightful research work, which is the key enabler of our project. Thank you!
My pleasure : ) Good luck with your project!