CYF2000127 / ReactionImgMLLM-beifen

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ReactionImgMLLM

This is the offical code of following paper "ReactionImgMLLM: A Multimodal Large Language Model for Reaction Image Data Extraction".

Highlights

In this paper, we present ReactionImgMLLM, a multimodal large language model for different reaction image data extraction tasks such as reaction extraction task, condition OCR and role identification task. We first formulate these tasks into different task instructions. The model then aligns the task instructions with features extracted from reaction images. An LLM-based decoder can further make predictions based on these instructions. For the reaction extraction task, our model can achieve over 84%-92% soft match F1 score on multiple test sets, which significantly outperforms the previous works. The experiments also show the outstanding condition OCR and role identification abilities.

visualization

Overall Architecture of our ReactionImgMLLM.

About


Languages

Language:Python 92.6%Language:Shell 7.4%