zhuzilin/faster-nougat

faster-nougat

Implementation of nougat that focuses on processing pdf locally.

I hope this could be a helpful component for a good open source RAG system.

Installation

git clone https://github.com/zhuzilin/faster-nougat
cd faster-nougat
pip install .

You can then try the example with simple_arxiv_reader.py (using deepseek api by default). For example, we could let the llm list the contribution of Attention Is All You Need with its 1,2,10 page of the origin paper.

python simple_arxiv_reader.py \
    --arxiv_url https://arxiv.org/pdf/1706.03762 \
    --pages 1 2 10 \
    --llm_key $YOUR_LLM_KEY \
    --question "please list the main contribution of the paper."

benchmark

The current benchmark is parsing the second page of the great Attention Is All You Need with nougat-small.

On M1 pro, the result is:

	huggingface	faster nougat
time/sec	21.7	4.5

To reproduce, run:

# download test pdf
wget https://arxiv.org/pdf/1706.03762 -O 1706.03762v7.pdf

# huggingface impl from:
# https://huggingface.co/docs/transformers/main/en/model_doc/nougat
python benchmark/benchmark_hf.py

# faster nougat impl
python benchmark/benchmark_faster_nougat.py

Rationale

There is no magic here :p, I reimplement the decoder part of nougat in MLX, which is much faster than pytorch on apple silicons.

TODOs

Implement encoder in MLX (may not be necessary, as encoder takes little time).
Explore the possibility of implement this in llama.cpp or other backends.

zhuzilin / faster-nougat

faster-nougat

Installation

benchmark

Rationale

TODOs

About

Languages