pathintegral-institute / markitup

Python tool for converting files and office documents to LLM messages

Repository from Github https://github.compathintegral-institute/markitupRepository from Github https://github.compathintegral-institute/markitup

MarkItUp

This is a fork of MarkItDown.

While markitdown is a useful tool, its returned content is too text-focused, which is not updated to the current rise of multi-modal LLMs.

Features

  • Converts various file formats to markdown-oriented OpenAI compatible responses
  • Supports multiple file types including:
    • Documents: DOCX (not DOC)
    • Presentations: PPTX (not PPT)
    • Spreadsheets: XLSX, XLS, CSV
    • Media: Audio files (MP3, M4A)
    • Web content: HTML
    • PDF files
    • Plain text files
  • Returns OpenAI compatible response, which can be used by most LLM clients
  • Supports command line usage

Installation

Install directly from GitHub:

pip install git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup
uv add git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup

Optional Dependencies

To use audio transcription using pydub, install markitup[audio]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[audio]"

To use enhanced file type detection with python-magic, install markitup[magic]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[magic]"

To install all optional dependencies, use markitup[all]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[all]"

Usage

from markitup.converter_utils.utils import read_files_to_bytestreams
from markitup import MarkItUp, Config

fs = read_files_to_bytestreams('packages/markitup/tests/test_files')

miu = MarkItUp(
    config=Config(
        modalities=['image', 'audio'],
        image_use_webp=True
        )
    )

result, stream_info = miu.convert(stream=fs[file_name], file_name=file_name)

Development

Running Tests

To run the test suite, first install Hatch (which provides better test isolation):

uv tool install hatch

Then navigate to the package directory and run the tests:

cd packages/markitup
hatch test

Or for verbose output:

cd packages/markitup
hatch test -- -v

The test suite includes tests for all supported file formats and converter functionality. Hatch provides better isolation from conflicting globally installed packages than other tools.

About

Python tool for converting files and office documents to LLM messages

License:MIT License


Languages

Language:HTML 89.3%Language:Python 10.6%Language:Jupyter Notebook 0.1%