PDF Summarizer

This project presents a purely simple local attempt for summarizing academic PDF with learning-based tool-box.

Main Components and Functions

This project contains the following parts:

PyMuPDF PDF parser to handel PDF files.
EfficientDet Layout detection model from layoutparser. Install with pip install "layoutparser[effdet]".
Open-source EN/CN LLM ChatGLM. Install pytorch with cuda and transformers (version<4.37.0).
Streamlit library for web page creation.

After installing libraries for layoutparser and ChatGLM, run pip3 install -r requirements.txt to install other dependencies.

Start the local serving with

python3 -m streamlit run web_ui.py --server.fileWatcherType none

Accept PDF upload / link upload.
Extract all figures and tables in the PDF file.
ChatGLM tries to summarize the paper's idea from the first a few thousands of characters of the text (depends on parameters and GPU memory). Giving response in English first, then in Chinese.

Take the chatGLM paper as example.

a purely simple local attempt for summarizing academic PDF with learning-based tool-box.

Language:Python 100.0%