lzw-lzw / GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Introduction

GroundingGPT is an end-to-end multimodal grounding model that accurately comprehends inputs and possesses robust grounding capabilities across multi modalities,including images, audios, and videos. To address the issue of limited data, we construct a diverse and high-quality multimodal training dataset. This dataset encompasses a rich collection of multimodal data enriched with spatial and temporal information, thereby serving as a valuable resource to foster further advancements in this field. Extensive experimental evaluations validate the effectiveness of the GroundingGPT model in understanding and grounding tasks across various modalities.

More details are available in our project page.


The overall structure of GroundingGPT. Blue boxes represent video as input, while yellow boxes represent image as input.

News

  • [2024.3.5] Our training dataset are available now!
  • [2024.3.1] Our code are available now!

Statement of Clarification

We hereby clarify that the Language Enhanced Multi-modal Grounding Model (formerly referred to as a LEGO Language Model), which has been modified to GroundingGPT, is in no way associated with or endorsed by the LEGO Group. There is no investment, collaboration, or any other form of relationship between the LEGO Group and our model previously using the LEGO name. We kindly request that any media or third-party entities that have published or disseminated inaccurate or misleading reports regarding this model promptly correct or remove the misinformation. Your immediate attention to this matter would be greatly appreciated. We deeply apologize for any confusion, inconvenience, or harm caused by these misconducts to the LEGO Group.

Acknowledgement

About

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

License:Apache License 2.0


Languages

Language:Python 99.6%Language:Shell 0.4%