NormXU / nougat-latex-ocr

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

Home Page:https://arxiv.org/abs/2308.13418

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adding quantized models

ProfFan opened this issue · comments

Hi,

Thank you for this amazing model! I made a small tray utility with your model to convert LaTeX: https://github.com/ProfFan/Snap2LaTeX

However, running locally is not fast. It would be great if we can make quantized versions suitable for on-device inference :)

@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16

For simple (non-multiline/array) equations (example):
image

even the larger model is pretty fast (using MPS backend), averaging about 4 secs after 1st run (shader compilation etc). So the current model is pretty usable already :)

For bigger matrices and multi-line equations the decoding time (as expected) grows exponentially. Interestingly converting the model to half precision does not help that much.
matrix

@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16

How to add flash-attn2 on nougat?donut-swin doesn't seem to support flash-attn2