AGI-Edgerunners / IIL

Code for our Paper "All in an Aggregated Image for In-Image Learning"

Home Page:https://arxiv.org/abs/2402.17971

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In-Image Learning

Code for the paper "All in an Aggregated Image for In-Image Learning".

IIL case

Requirement

pip install -r requirements.txt

Download Dataset

The processed dataset and demonstration examples are available from this link. Unzip the file after downloading and keep the dataset directory in the root directory of the project.

----IIL
    |----dataset
    |----src
    ...

Run In-Image Learning and Baselines

In-Image Learning

python run_iil.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Visual-text interleaved in-context learning

python run_vticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Text-only in-context learning

python run_ticl.py --exp_name exp_on_mv --dataset mathvista --lt few_shot

Cite

If you find In-Image Learning useful for your research and applications, please kindly cite using this BibTeX:

@misc{wang2024single,
      title={All in a Single Image: Large Multimodal Models are In-Image Learners}, 
      author={Lei Wang and Wanyu Xu and Zhiqiang Hu and Yihuai Lan and Shan Dong and Hao Wang and Roy Ka-Wei Lee and Ee-Peng Lim},
      year={2024},
      eprint={2402.17971},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Code for our Paper "All in an Aggregated Image for In-Image Learning"

https://arxiv.org/abs/2402.17971


Languages

Language:Python 100.0%