izderadicka / pdfparser

Python binding to libpoppler with focus on text extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pkg-config is a dependency?

kuraga opened this issue · comments

$ ~/miniconda3/bin/pip install git+https://github.com/izderadicka/pdfparser                                                                                                                                                   
Collecting git+https://github.com/izderadicka/pdfparser                                                                                                                                                                                       
  Cloning https://github.com/izderadicka/pdfparser to /tmp/pip-req-build-6ryx319w                                                                                                                                                             
  Running command git clone -q https://github.com/izderadicka/pdfparser /tmp/pip-req-build-6ryx319w                                                                                                                                           
    ERROR: Command errored out with exit status 1:                                                                                                                                                                                            
     command: /home/a.kurakin/miniconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-6ryx319w/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-6ryx319w/setup.py'"'"';f=getattr(tokenize, '"'"'open'
"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info                                                                           
         cwd: /tmp/pip-req-build-6ryx319w/                                                                                                                                                                                                    
    Complete output (15 lines):                                                                                                                                                                                                               
    Traceback (most recent call last):                                                                                                                                                                                                        
      File "<string>", line 1, in <module>                                                                                                                                                                                                    
      File "/tmp/pip-req-build-6ryx319w/setup.py", line 88, in <module>                                                                                                                                                                       
        poppler_config = pkgconfig("poppler", "poppler-cpp")                                                                                                                                                                                  
      File "/tmp/pip-req-build-6ryx319w/setup.py", line 62, in pkgconfig                                                                                                                                                                      
        items = subprocess.check_output(['pkg-config', optional_args, pkg_option, package]).decode('utf8').split()                                                                                                                            
      File "/home/a.kurakin/miniconda3/lib/python3.7/subprocess.py", line 395, in check_output                                                                                                                                                
        **kwargs).stdout                                                                                                                                                                                                                      
      File "/home/a.kurakin/miniconda3/lib/python3.7/subprocess.py", line 472, in run                                                                                                                                                         
        with Popen(*popenargs, **kwargs) as process:                                                                                                                                                                                          
      File "/home/a.kurakin/miniconda3/lib/python3.7/subprocess.py", line 775, in __init__                                                                                                                                                    
        restore_signals, start_new_session)                                                                                                                                                                                                   
      File "/home/a.kurakin/miniconda3/lib/python3.7/subprocess.py", line 1522, in _execute_child                                                                                                                                             
        raise child_exception_type(errno_num, err_msg, err_filename)                                                                                                                                                                          
    FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config': 'pkg-config'                                                                                                                                                        
    ----------------------------------------                                                                                                                                                                                                  
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. 

This solved the problem (but wasn't found in documentation):

sudo apt install -y pkg-config

It is, but optional - depending how you link libpoppler - if you link to system wide library then pkg-config is used. Not exactly sure when it installs, but it's usually included if you install development dependencies.
I guess we can mention it in README? Can you submit PR?

No, sorry, I don't know internals. Feel free to close.

Check this fork which works both on Linux and Mac: https://github.com/rossumai/pdfparser (+ installation instructions and brew formulas/deb packages for poppler).

Anyway pdfparser uses a deprecated internal API (xpdf & cairo) for poppler and there's a better alternative which uses the CPP API and is much faster for both image rendering and text extraction: https://pypi.org/project/python-poppler/