DS4SD / docling

Get your docs ready for gen AI

Home Page:https://ds4sd.github.io/docling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add figures in markdown output

dolfim-ibm opened this issue · comments

The current markdown output is skipping the figures objects.

We should allow users to have the images in the output as well.

Proposed format

...

<image>
Figure 2: Distribution of DocLayNet pages across document categories.

...

where <image> is a placeholder and the text (if present) is the respective caption.

The placeholder should be customizable, e.g. using <!-- image --> (which is a markdown comment). The initial choice of <image> is motivated by the requirements of the llava input format.

The export_figures.py example will then be updated showing how to replace the <image> placeholder with an actual markdown image pointing to the exported files.