Lavender105 / RSGPT

RSGPT: A Remote Sensing Vision Language Model and Benchmark

Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li☨

☨corresponding author

This is an ongoing project. We are working on increasing the dataset size.

🔥 Updates

[2024.05.23] We release the RSICap dataset. Please fill out this form to get both RSICap and RSIEval dataset.
[2023.11.10] A survey paper about vision-language models in remote sensing. RSVLM.
[2023.10.22] The RSICap dataset and code will be released upon paper acceptance.
[2023.10.22] We release the evaluation dataset RSIEval. Please fill out this form to get both the RSIEval dataset.

Dataset

RSICap: 2,585 image-text pairs with high-quality human-annotated captions.
RSIEval: 100 high-quality human-annotated captions with 936 open-ended visual question-answer pairs.

Code

The idea of finetuning our vision-language model is borrowed from MiniGPT-4. Our model is based on finetuning InstructBLIP using our RSICap dataset.

Acknowledgement

MiniGPT-4. A popular open-source vision-language model.
InstructBLIP. The model architecture of RSGPT follows InstructBLIP. Don't forget to check out this great open-source work if you don't know it before!
Lavis. This repository is built upon Lavis!
Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!

If you're using RSGPT in your research or applications, please cite using this BibTeX:

@article{hu2023rsgpt,
  title={RSGPT: A Remote Sensing Vision Language Model and Benchmark},
  author={Hu, Yuan and Yuan, Jianlong and Wen, Congcong and Lu, Xiaonan and Li, Xiang},
  journal={arXiv preprint arXiv:2307.15266},
  year={2023}
}

About