Lavender105 / RSGPT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RSGPT: A Remote Sensing Vision Language Model and Benchmark

Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li☨

☨corresponding author

This is an ongoing project. We are working on increasing the dataset size.

🔥 Updates

  • [2024.05.23] We release the RSICap dataset. Please fill out this form to get both RSICap and RSIEval dataset.
  • [2023.11.10] A survey paper about vision-language models in remote sensing. RSVLM.
  • [2023.10.22] The RSICap dataset and code will be released upon paper acceptance.
  • [2023.10.22] We release the evaluation dataset RSIEval. Please fill out this form to get both the RSIEval dataset.

Dataset

  • RSICap: 2,585 image-text pairs with high-quality human-annotated captions.
  • RSIEval: 100 high-quality human-annotated captions with 936 open-ended visual question-answer pairs.

Code

The idea of finetuning our vision-language model is borrowed from MiniGPT-4. Our model is based on finetuning InstructBLIP using our RSICap dataset.

Acknowledgement

  • MiniGPT-4. A popular open-source vision-language model.
  • InstructBLIP. The model architecture of RSGPT follows InstructBLIP. Don't forget to check out this great open-source work if you don't know it before!
  • Lavis. This repository is built upon Lavis!
  • Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!

If you're using RSGPT in your research or applications, please cite using this BibTeX:

@article{hu2023rsgpt,
  title={RSGPT: A Remote Sensing Vision Language Model and Benchmark},
  author={Hu, Yuan and Yuan, Jianlong and Wen, Congcong and Lu, Xiaonan and Li, Xiang},
  journal={arXiv preprint arXiv:2307.15266},
  year={2023}
}

About