LlamaParse is an API created by LlamaIndex to effeciently parse and represent files for effecient retrieval and context augmentation using LlamaIndex frameworks.
LlamaParse directly integrates with LlamaIndex.
Currently available in preview mode for free. Try it out today!
NOTE: Currently, only PDF files are supported.
First, login and get an api-key from https://cloud.llamaindex.ai
.
Install the package:
pip install llama-parse
Then, you can run the following to parse your first PDF file:
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
verbose=True
)
# sync
documents = parser.load_data("./my_file.pdf")
# async
documents = await parser.aload_data("./my_file.pdf")
You can also integrate the parser as the default PDF loader in SimpleDirectoryReader
:
import nest_asyncio
nest_asyncio.apply()
from llama_parse import LlamaParse
from llama_index import SimpleDirectoryReader
parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
verbose=True
)
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader("./data", file_extractor=file_extractor).load_data()
Full documentation for SimpleDirectoryReader
can be found on the LlamaIndex Documentation.
Serveral end-to-end indexing examples can be found in the examples folder
See the Terms of Service Here.