π₯ Dataset can be found in π€Huggingface, which contain 219,437 image descriptions. Link to our paper: arxiv.
See detailed instructions in install.md.
- COCO: Download here train2017.
- SAM: Click here SAM (sa_000000.tar ~ sa_000024.tar).
- VG: Click here VG.
After downloading, organize the image datasets as follows in ./dataset/
:
βββ coco
β βββ train2017
βββ sam
βββ images
βββ vg
After install all the requirements, you can follow use.md to generate description on your datasets.
![image](https://private-user-images.githubusercontent.com/119802220/338402302-9562860a-96b6-4253-9305-d133161eea70.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwOTM3MDQsIm5iZiI6MTcyMDA5MzQwNCwicGF0aCI6Ii8xMTk4MDIyMjAvMzM4NDAyMzAyLTk1NjI4NjBhLTk2YjYtNDI1My05MzA1LWQxMzMxNjFlZWE3MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxMTQzMjRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yYTA5NjVhOTlmOTRhZmNlYWQzODMwNGJjOTE5YjIwOTkxNTAyMDc4ZjU3ODE2NjRkNzI3ZmJhMDQyYzhhZWE1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.aaxruo65VQi4c0q35_NPH7-RTqErHHaANgUGwL8cupQ)
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{pi2024image,
title={Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions},
author={Renjie Pi and Jianshu Zhang and Jipeng Zhang and Rui Pan and Zhekai Chen and Tong Zhang},
year={2024},
eprint={2406.07502},
archivePrefix={arXiv},
primaryClass={cs.CV}
}