googlesamples / mlkit

A collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

document scanner rotates the image when it recognizes the document

Tavorc opened this issue · comments

commented

When i'm using the ML kit for document scanner, most of the time(like 95%), the document that's recognized by the library it rotates the image.
it doesn't matter if i'm using the automatic mode or manual

any idea how to solve it?
Does it happen to someone?
Screenshot_20240326_095202_Google Play services

Does your unwanted rotation happen with LTR scripted docs? Are all of your docs receipts?

I suspect that MLKit's document scanner "UI flow" may be slightly tied with Text Recognition API, which in turn does not support Hebrew or any other RTL scripts. Even if it's irrelevant..

As you may have noticed in real life people who don't know Hebrew are trying to read documents written in Hebrew upside down. Idk about other rtl scripts irl, but I guess the fact that almost all Hebrew letters have the same height does not help at all.

These are just thots, I am not affiliated with Google in any way. I believe it's indeed a bug since API seems to be designed text-agnostic. Especially in manual mode.

If all of your data are receipts of similar format, probably you can postprocess them on low level or with tesseract

Inspired by:

#784 (comment)

It seems that your Android language is English, can you try switching it to Hebrew?

commented

first of all thank you.
Yes, all of the docs are receipts, it's fintech app.
I tried to change the language to Hebrew, doesn't work.

there is openCV library that i can use to cropping an image, but i didn't want it because the ML kit is more innovative.

Thanks for the feedback.

There is an auto-rotation step in the scanning flow. The intention is that when you hold the phone in parallel to the table, it may trigger the phone's and camera's auto rotation logic, and results in taking images with wrong orientation. However, apparently that text-based model doesn't work very well in this case.

What do you think would be a better behavior for you? Ideally, the model just handles everything. But if not the case, an option to turn on/off auto-rotation OR something else in your mind?

commented

I think you can know what is the orientation of the device, for example in the camera there is label 1x that represent the zoom, when i rotate the device the "1X" will rotate also, so probably you can use this.