- Main code for IT framework.
- Data cleaning is on-going. Expect to open-source 170K data before 6/17.
- Code for evaluation.
- Release the usage of our IT framework.
π₯ Now, IT-170K dataset can be found in π€Huggingface. Link to our paper: arxiv.
See detailed instructions in install.md.
- COCO: Download here train2017.
- SAM: Click here SAM (sa_000000.tar ~ sa_000024.tar).
- VG: Click here VG.
After downloading, organize the image datasets as follows in ./dataset/
:
βββ coco
β βββ train2017
βββ sam
βββ images
βββ vg
After install all the requirements, you can follow use.md to generate description on your datasets.
![image](https://private-user-images.githubusercontent.com/119802220/338402302-9562860a-96b6-4253-9305-d133161eea70.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkzODQ5NDUsIm5iZiI6MTcxOTM4NDY0NSwicGF0aCI6Ii8xMTk4MDIyMjAvMzM4NDAyMzAyLTk1NjI4NjBhLTk2YjYtNDI1My05MzA1LWQxMzMxNjFlZWE3MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNlQwNjUwNDVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yNTcyOWE1NzIyZjM4MzNiNTc3YTk4ODkyNDBjN2RjYTUyZjNkOTE3NDc0Njk3ODk5NDAzOWFlOTZjZWVlZTE1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.X4EMvohq9AjyH_eEIXsikL2kt6cyFtPGccKELeODL2g)
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{pi2024image,
title={Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions},
author={Renjie Pi and Jianshu Zhang and Jipeng Zhang and Rui Pan and Zhekai Chen and Tong Zhang},
year={2024},
eprint={2406.07502},
archivePrefix={arXiv},
primaryClass={cs.CV}
}