MegaParse - Your Mega Parser for every type of documents

MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.

Key Features 🎯

Versatile Parser: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
No Information Loss: Focus on having no information loss during parsing.
Fast and Efficient: Designed with speed and efficiency at its core.
Wide File Compatibility: Supports Text, PDF, Powerpoint presentations, Excel, CSV, Word documents.
Open Source: Freedom is beautiful, and so is MegaParse. Open source and free to use.

Support

Files: ✅ PDF ✅ Powerpoint ✅ Word
Content: ✅ Tables ✅ TOC ✅ Headers ✅ Footers ✅ Images

Example

Installation

pip install megaparse

Usage

Create an account on Llama Cloud and get your API key.
Create a new file in the root directory of the project and name it .env.
Add the following line to the .env file and replace llx-your_api_key with your actual API key.

LLAMA_CLOUD_API_KEY=llx-your_api_key

Now you can use the following code to convert a PDF to Markdown and save it to a file.

from megaparse import MegaParse

megaparse = MegaParse(file_path="./test.pdf")
content = megaparse.convert()
print(content)
megaparse.save_md(content, "./test.md")

Next Steps

About

File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.

https://pypi.org/project/megaparse/

Apache License 2.0

Languages

Language:Jupyter Notebook 72.6%Language:Python 27.4%