ktaaaki / paper2html

Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

import in script not working

alexmaehon opened this issue · comments

Ubuntu 18 lts
python 3.7
after installing as instructed:

apt install poppler-utils poppler-data
git clone https://github.com/ktaaaki/paper2html.git
pip install -e paper2html

when converting the local file, running command works fine, but when using import in the script:

import paper2html
paper2html.paper2html("xxx")

error info:
AttributeError: module 'paper2html' has no attribute 'paper2html'

when I call help(paper2html):

Help on package paper2html:

NAME
paper2html

PACKAGE CONTENTS

FILE
(built-in)

What should I do?

My guess is that you installed a paper2html module to another python and import the cloned project directory directly.

Please check a pip's installed list

pip list | grep paper2html

to confirm that paper2html is installed and where it is installed. If it is installed, you can see like

paper2html               0.4.1                    /home/your_name/some_dir/... /paper2html

If it is installed, please check module path

import paper2html
print(paper2html.__file__)

If you see the path to __init__.py in the cloned project directory, then there is a problem with the contents of __init__.py. If you do not see it, then there is another problem with the installation.

After I post this, I tried Allenai's s2orc-doc2json and installed it, going back. And then see your reply, retried it, it works now. Have no idea how it happens.

Aside from it, I see that this function converts pdf to html, but inside the content of pdf is still png format. How to get the effect that shows in your introduction? That the content of pdf also turns into text?

The converted html consists of two panes: png-formatted captures of pdf and the aligned text originally embedded in the pdf. As you can see in our introductory gif video, only one side is translated by browser translation.

I see that now, thank you!