potamides / DeTikZify

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DeTikZify
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

arXiv Hugging Face Colab

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy. Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. We also introduce an MCTS-based inference algorithm that enables DeTikZify to iteratively refine its outputs without the need for additional training.

Showcase.mp4

Installation

Tip

If you encounter difficulties with installation and inference on your own hardware, consider visiting our Hugging Face Space (if the space is restarting the first run might take 10-15 minutes to download and load the model). Should you experience long queues, you have the option to duplicate it with a paid private GPU runtime for a more seamless experience. Additionally, you can try our demo on Google Colab. However, setting up the environment there might take some time, and the free tier only supports inference for the 1b models. Do not forget to read our usage tips!

The Python package of DeTikZify can be easily installed using pip:

pip install 'detikzify @ git+https://github.com/potamides/DeTikZify'

Or, if your goal is to run the included examples, clone the repository and install it in editable mode like this:

git clone https://github.com/potamides/DeTikZify
pip install -e DeTikZify

In addition, DeTikZify requires a full TeX Live 2023 installation, ghostscript, and, poppler which you have to install through your package manager or via other means.

Usage

If all required dependencies are installed, the full range of DeTikZify features such as compiling, rendering, and saving TikZ graphics, and MCTS-based inference can be accessed through its programming interface:

from operator import itemgetter

from detikzify.model import load
from detikzify.infer import DetikzifyPipeline
import torch

image = "https://w.wiki/A7Cc"
pipeline = DetikzifyPipeline(*load(
    base_model="nllg/detikzify-ds-7b",
    device_map="auto",
    torch_dtype=torch.bfloat16,
))

# generate a single TikZ program
fig = pipeline.sample(image=image)

# if it compiles, rasterize it and show it
if fig.is_rasterizable:
    fig.rasterize().show()

# run MCTS for 10 minutes and generate multiple TikZ programs
figs = set()
for score, fig in pipeline.simulate(image=image, timeout=600):
    figs.add((score, fig))

# save the best TikZ program
best = sorted(figs, key=itemgetter(0))[-1][1]
best.save("fig.tex")

For interactive use and additional usage tips, we recommend checking out our web UI, which can be started from the command line (use --help for a list of all options):

python -m detikzify.webui --light

More involved examples, for example for evaluation and training, can be found in the examples folder.

Model Weights & Datasets

We upload all our models and datasets to the Hugging Face Hub. However, please note that for the public release of the DaTikZv2 dataset, we had to remove a considerable portion of TikZ drawings originating from arXiv, as the arXiv non-exclusive license does not permit redistribution. We do, however, release our dataset creation scripts and encourage anyone to recreate the full version of DaTikZv2 themselves.

Citation

If DeTikZify has been beneficial for your research or applications, we kindly request you to acknowledge its use by citing it as follows:

@misc{belouadi2024detikzify,
      title={DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ}, 
      author={Jonas Belouadi and Simone Paolo Ponzetto and Steffen Eger},
      year={2024},
      eprint={2405.15306},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgments

The implementation of the model architecture is largely based on LLaVA. Our MCTS implementation takes heavy inspiration from VerMCTS.

About

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

License:Apache License 2.0


Languages

Language:Python 100.0%