mrpositron / paper2tex

Extracting LaTeX equations from PDF

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paper2Tex (discontinued)

Open In Colab

The following project is a tool to extract equations from the research papers (images, PDFs, etc.) and convert it into latex code.

This project is heavily utilizes the following projects:

Credit goes to the authors of the above projects, @MaliParag, @lukas-blecher, @jjdredd.

How to use?

paper2tex.ipynb is the main notebook. It contains the code to extract equations from the paper. The notebook is self explanatory.

Example

“”

Extracted equations are in boxes with yellow border. In top left corner of each box, there is a number which is the id of the equation, and on the top right corner.
The extracted equations are:

  • $$\text{id:}0 \Rightarrow {\frac{1}{N}}\sum_{i=1}^{N}\ell(\mathbf{x}_{i},\Theta)$$
  • $$\text{id:}1 \Rightarrow \Theta_{2}\leftarrow\Theta_{2}-\frac{\alpha}{m}\sum_{i=1}^{m}\frac{\partial F_{2}({\bf x}_{i},\Theta_{2})}{\partial\Theta_{2}}$$
  • $$\text{id:}2 \Rightarrow \ell=F_{2}(F_{1}(\mathbf{u},\Theta_{1}),\Theta_{2})$$
  • $$\text{id:}3 \Rightarrow {\frac{1}{m}}{\frac{\partial\ell(\mathbf{x}_{i},\Phi)}{\partial\Theta}}$$
  • $$\text{id:}4 \Rightarrow \ell=F_{2}(\cdot)$$

Things to do

  • Add a notebook to extract equations from the paper.
  • Implement a GPU version of the code.
  • Upload it to the colab
  • Find a way to use inference LaTeX-OCR in batch mode.
  • Detect paper borders

About

Extracting LaTeX equations from PDF

License:MIT License


Languages

Language:Python 96.5%Language:Jupyter Notebook 3.5%