bingwork / ChartLlama-code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

                 

Model       Dataset

Yucheng Han*, Chi Zhang*(Corresponding Author), Xin Chen, Xu Yang, Zhibin Wang
Gang Yu, Bin Fu, Hanwang Zhang

(* equal contributions)

From Tencent and Nanyang Technological University.

πŸ”† Introduction

πŸ€—πŸ€—πŸ€— We first create an instruction-tuning dataset based on our proposed data generation pipeline. Then, we train ChartLlama on this dataset and achieve the abilities shown in the figure.

Examples about the abilities of ChartLlama.

Redraw the chart according to the given chart, and edit the chart following instructions.

Draw a new chart based on given raw data and instructions

πŸ“ Changelog

  • [2023.11.27]: πŸ”₯πŸ”₯ Update the inference code and model weights.
  • [2023.11.27]: Create the git repository.

βš™οΈ Setup

Refer to the LLaVA-1.5. Since I have uploaded the code, you can just install by

pip install -e .

πŸ’« Inference

You need to first install LLaVA-1.5, then use model_vqa_lora to do inference. The model_path is the path to our Lora checkpoints, the question-file is the json file containing all questions, the image-folder is the folder containing all your images and the answers-file is the output file name.

Here is an example:

CUDA_VISIBLE_DEVICES=1 python -m llava.eval.model_vqa_lora --model-path /your_path_to/LLaVA/checkpoints/${output_name} \
    --question-file /your_path_to/question.json \
    --image-folder ./playground/data/ \
    --answers-file ./playground/data/ans.jsonl \
    --num-chunks $CHUNKS \
    --chunk-idx $IDX \
    --temperature 0 \
    --conv-mode vicuna_v1 &

πŸ“– TO-DO LIST

  • Create and open source a new chart dataset in Chinese.
  • Open source the training scripts and the dataset.
  • Open source the evaluation scripts.
  • Open source the evaluation dataset.
  • Open source the inference script.
  • Open source the model.
  • Create the git repository.

πŸ˜‰ Citation

@misc{han2023chartllama,
      title={ChartLlama: A Multimodal LLM for Chart Understanding and Generation}, 
      author={Yucheng Han and Chi Zhang and Xin Chen and Xu Yang and Zhibin Wang and Gang Yu and Bin Fu and Hanwang Zhang},
      year={2023},
      eprint={2311.16483},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

πŸ“’ Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


About

License:MIT License


Languages

Language:Python 86.2%Language:Shell 10.0%Language:JavaScript 2.0%Language:HTML 1.5%Language:CSS 0.4%