jalan / pdftotext

Simple PDF text extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't install using conda/mamba

mdaeron opened this issue · comments

Based on https://github.com/conda-forge/pdftotext-feedstock I did:

conda config --add channels conda-forge
conda config --set channel_priority strict
mamba install pdftotext

This fails miderably because nothing provides requested pdftotext. I also tried:

  • conda install pdftotext (yields PackagesNotFoundError)
  • pip install pdftotext (yields Could not build wheels for pdftotext).

I'm using:

  • macOS 12.3.1
  • mamba 1.1.0
  • conda 23.3.1
  • Python 3.11.3

What am I doing wrong? I'd like to avoid mixing brew and mamba because that way madness lies.

After looking into it, I believe this is because I'm on Apple silicon, but https://anaconda.org/conda-forge/pdftotext has osx-64 but not osx-arm64. Is there another way forward apart from brew? And if I use brew, what would be the easiest way to link the installed pdftotext to a conda environment?

Aha, that makes sense. I think you can still avoid brew, assuming that conda-forge has all the right dependencies available as osx-arm64. Try this:

conda install c-compiler cxx-compiler pkg-config poppler
pip install pdftotext

For reference, my CI jobs test that this approach works: https://github.com/jalan/pdftotext/actions/runs/5038405263/jobs/9035872397

Aha, that makes sense. I think you can still avoid brew, assuming that conda-forge has all the right dependencies available as osx-arm64. Try this:

conda install c-compiler cxx-compiler pkg-config poppler
pip install pdftotext

This fails with the following message:

% pip install pdftotext

Collecting pdftotext
  Using cached pdftotext-2.2.2.tar.gz (113 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: pdftotext
  Building wheel for pdftotext (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [36 lines of output]
      Error: No such keg: /opt/homebrew/Cellar/poppler
      running bdist_wheel
      running build
      running build_ext
      building 'pdftotext' extension
      creating build
      creating build/temp.macosx-11.0-arm64-cpython-311
      clang -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/daeron/mambaforge/envs/pdftotxt/include -arch arm64 -fPIC -O2 -isystem /Users/daeron/mambaforge/envs/pdftotxt/include -arch arm64 -DPOPPLER_CPP_AT_LEAST_0_30_0=1 -DPOPPLER_CPP_AT_LEAST_0_58_0=1 -DPOPPLER_CPP_AT_LEAST_0_88_0=1 -I/usr/local/include -I/Users/daeron/mambaforge/envs/pdftotxt/include/python3.11 -c pdftotext.cpp -o build/temp.macosx-11.0-arm64-cpython-311/pdftotext.o -Wall -mmacosx-version-min=10.9 -std=c++11
      In file included from pdftotext.cpp:1:
      In file included from /Users/daeron/mambaforge/envs/pdftotxt/include/python3.11/Python.h:23:
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/stdlib.h:150:34: error: unknown type name 'ldiv_t'
      inline _LIBCPP_INLINE_VISIBILITY ldiv_t div(long __x, long __y) _NOEXCEPT {
                                       ^
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/stdlib.h:151:12: error: no member named 'ldiv' in the global namespace
        return ::ldiv(__x, __y);
               ~~^
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/stdlib.h:154:34: error: unknown type name 'lldiv_t'
      inline _LIBCPP_INLINE_VISIBILITY lldiv_t div(long long __x,
                                       ^
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/stdlib.h:156:12: error: no member named 'lldiv' in the global namespace
        return ::lldiv(__x, __y);
               ~~^
      In file included from pdftotext.cpp:1:
      In file included from /Users/daeron/mambaforge/envs/pdftotxt/include/python3.11/Python.h:26:
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/string.h:95:102: error: unknown type name 'size_t'
      inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_PREFERRED_OVERLOAD const void* memchr(const void* __s, int __c, size_t __n) {
                                                                                                           ^
      /Users/daeron/mambaforge/envs/pdftotxt/bin/../include/c++/v1/string.h:98:90: error: unknown type name 'size_t'
      inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_PREFERRED_OVERLOAD void* memchr(void* __s, int __c, size_t __n) {
                                                                                               ^
      In file included from pdftotext.cpp:1:
      /Users/daeron/mambaforge/envs/pdftotxt/include/python3.11/Python.h:29:12: fatal error: 'unistd.h' file not found
      #  include <unistd.h>
                 ^~~~~~~~~~
      7 errors generated.
      error: command '/Users/daeron/mambaforge/envs/pdftotxt/bin/clang' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pdftotext
  Running setup.py clean for pdftotext
Failed to build pdftotext
ERROR: Could not build wheels for pdftotext, which is required to install pyproject.toml-based projects

Never mind my last message.

It turns out the nice folks at conda-forge have already added osx-arm64 and closed conda-forge/pdftotext-feedstock#27, so that conda/mamba install pdftotext now works out of the box!