OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with reading documents with double columns

Hastyrush opened this issue · comments

Hi, thanks for the amazing work done on MiniCPM!

I would like to enquire if the model is capable of extracting text (be it ocr or not) on documents that have double columns such as research papers. I.e. the paragraphs are meant to be read vertically instead of horizontally. I did some experiments on the prompts but it seems that the model cannot interpret documents with double columns. The result is either omitting the other column, or it combines a line from both columns (reading it horizontally instead of vertically). Not sure if this can be mitigated, so some advice would be appreciated. Thanks!

Can you give us an example or two so that we can get a clearer picture, our model has some capacity of table extraction ~ but to makeit perform very well in specific scenarios, it may require small amounts of data to fine-tune it