pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Home Page:https://pymupdf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Struggling to compile PyMuPDF. Segmentation fault

C-monC opened this issue · comments

Description of the bug

Build alpine linux docker with pymupdf fails with seg fault

How to reproduce the bug

FROM python:3.11-alpine

RUN mkdir /app-sources
WORKDIR /app-sources

RUN apk update && \
    apk add wget bash ghostscript tesseract-ocr && \
    apk add openjdk11-jre-headless libxrender-dev libpq-dev gcc poppler-utils alpine-sdk curl cmake make linux-headers krb5-pkinit krb5-dev krb5 freetds-dev libstdc++-dev build-base jbig2dec jpeg-dev  harfbuzz-dev libc-dev  mupdf-dev musl-dev  openjpeg-dev  swig && \
    python3.11 -m pip install --upgrade pip setuptools setuptools-rust wheel pyarmor PyMuPdf && \
    ln -s /usr/lib/libjbig2dec.so.0 /usr/lib/libjbig2dec.so && \
    curl https://sh.rustup.rs -sSf | sh -s -- -y && \
    python3.11 -m pip install opencv-python-headless

docker build the above docker file and the below error occurs

cc -ffunction-sections -fdata-sections -pipe -O2 -DNDEBUG -DTOFU_CJK_EXT -Iinclude -MMD -MP -o build/PyMuPDF-x86_64-shared-tesseract-release/thirdparty/leptonica/src/pageseg.o -c thirdparty/leptonica/src/pageseg.c -fPIC -Ithirdparty/leptonica/src -Iscripts/tesseract -DLEPTONICA_INTERCEPT_ALLOC=1 -DHAVE_LIBPNG=0 -DHAVE_LIBTIFF=0 -DHAVE_LIBJPEG=0 -DHAVE_LIBZ=0 -DHAVE_LIBGIF=0 -DHAVE_LIBUNGIF=0 -DHAVE_LIBWEBP=0 -DHAVE_LIBWEBP_ANIM=0 -DHAVE_LIBJP2K=0 -Wno-address-of-packed-member
#-9 46.81 (+7.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: during GIMPLE pass: evrp
#-9 46.81 (+7.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: thirdparty/leptonica/src/numafunc1.c: In function 'numaUniformSampling':
#-9 46.81 (+7.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: thirdparty/leptonica/src/numafunc1.c:3633:1: internal compiler error: Segmentation fault
#-9 46.81 (+7.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: 3633 | }
#-9 46.81 (+7.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: | ^
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: mkdir -p build/PyMuPDF-x86_64-shared-tesseract-release/thirdparty/leptonica/src/ ; cc -ffunction-sections -fdata-sections -pipe -O2 -DNDEBUG -DTOFU_CJK_EXT -Iinclude -MMD -MP -o build/PyMuPDF-x86_64-shared-tesseract-release/thirdparty/leptonica/src/paintcmap.o -c thirdparty/leptonica/src/paintcmap.c -fPIC -Ithirdparty/leptonica/src -Iscripts/tesseract -DLEPTONICA_INTERCEPT_ALLOC=1 -DHAVE_LIBPNG=0 -DHAVE_LIBTIFF=0 -DHAVE_LIBJPEG=0 -DHAVE_LIBZ=0 -DHAVE_LIBGIF=0 -DHAVE_LIBUNGIF=0 -DHAVE_LIBWEBP=0 -DHAVE_LIBWEBP_ANIM=0 -DHAVE_LIBJP2K=0 -Wno-address-of-packed-member
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: 0x1aacfb7 internal_error(char const*, ...)
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: ???:0
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: Please submit a full bug report, with preprocessed source (by using -freport-bug).
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: Please include the complete backtrace with any bug report.
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: See https://gitlab.alpinelinux.org/alpine/aports/-/issues for instructions.
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: make: *** [Makethird:233: build/PyMuPDF-x86_64-shared-tesseract-release/thirdparty/leptonica/src/numafunc1.o] Error 1
#-9 46.81 (+7.5s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build: make: *** Waiting for unfinished jobs....
#-9 46.81 (+19.4s): -b: m: main.py:1605:build: (+0.0s): -b: m: main.py:1605:build:
#-9 46.81 (+19.4s): -b: m: main.py:1605:build: [returned e=2]
#-9 46.81 Traceback (most recent call last):
#-9 46.81 scripts/mupdfwrap.py:6:(): wrap.main.main()
#-9 46.81 scripts/wrap/main.py:3075:main(): jlib.exception_info()
#-9 46.81 ^except raise:
#-9 46.81 scripts/wrap/main.py:3073:main(): main2()
#-9 46.81 scripts/wrap/main.py:2467:main2(): build( build_dirs, swig_command, args, vs_upgrade, make_command)
#-9 46.81 scripts/wrap/main.py:1605:build(): jlib.system( command, prefix=jlib.log_text(), out='log', verbose=1)
#-9 46.81 scripts/jlib.py:1682:system(): raise Exception( message)
#-9 46.81 Exception: Command failed: cd /tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/mupdf-1.24.1-source && make -j 32 HAVE_GLUT=no HAVE_PTHREAD=yes verbose=yes shared=yes HAVE_LEPTONICA=yes HAVE_TESSERACT=yes build=release build_prefix=PyMuPDF-x86_64-shared-tesseract-
#-9 46.81 Traceback (most recent call last):
#-9 46.81 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
#-9 46.81 main()
#-9 46.81 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main`
#-9 46.81 json_out['return_val'] = hook(**hook_input['kwargs'])
#-9 46.81 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#-9 46.81 File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
#-9 46.81 whl_basename = backend.build_wheel(metadata_directory, config_settings)
#-9 46.81 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#-9 46.81 File "/tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/pipcl.py", line 642, in build_wheel
#-9 46.81 items = self._call_fn_build(config_settings)
#-9 46.81 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#-9 46.81 File "/tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/pipcl.py", line 804, in _call_fn_build
#-9 46.81 ret = self.fn_build()
#-9 46.81 ^^^^^^^^^^^^^^^
#-9 46.81 File "/tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/setup.py", line 558, in build
#-9 46.81 mupdf_build_dir = build_mupdf_unix( mupdf_local, env_extra, build_type)
#-9 46.81 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#-9 46.81 File "/tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/setup.py", line 807, in build_mupdf_unix
#-9 46.81 subprocess.run( command, shell=True, check=True)
#-9 46.81 File "/usr/local/lib/python3.11/subprocess.py", line 571, in run
#-9 46.81 raise CalledProcessError(retcode, process.args,
#-9 46.81 subprocess.CalledProcessError: Command 'cd /tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/mupdf-1.24.1-source && XCFLAGS=-DTOFU_CJK_EXT XCXXFLAGS=-DTOFU_CJK_EXT /usr/local/bin/python3.11 ./scripts/mupdfwrap.py -d build/PyMuPDF-x86_64-shared-tesseract-release -b all && echo /tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/mupdf-1.24.1-source/build/PyMuPDF-x86_64-shared-tesseract-release: && ls -l /tmp/pip-install-e1nt35da/pymupdf_a946f9ce6e22418285bbdbefe88806e9/mupdf-1.24.1-source/build/PyMuPDF-x86_64-shared-tesseract-release' returned non-zero exit status 1.
#-9 46.81 [end of output]
#-9 46.81
#-9 46.81 note: This error originates from a subprocess, and is likely not a problem with pip.
#-9 46.81 error: metadata-generation-failed
#-9 46.81
#-9 46.81 × Encountered error while generating package metadata.
#-9 46.81 ╰─> See above for output.
#-9 46.81
#-9 46.81 note: This is an issue with the package mentioned above, not pip.
#-9 46.81 hint: See above for details.

PyMuPDF version

1.24.1

Operating system

Other

Python version

3.11

This looks like a failure to build MuPDF, caused by a failure to compile MuPDF's thirdparty/leptonica/src/numafunc1.c.

So it's not a PyMuPDF problem, and i can't reproduce it here.

Some things you could look at or try:

  • What does cc --version show?
  • Are you running with a small data limit - what does ulimit -a say?
  • Follow the instructions in the error output to submit a compiler bug - run the original command along with an additional option -freport-bug.

The build happens within a Jenkins container. The issue is intermittent so it's probably got to do with my build environment - I updated Pymupdf to latest yesterday and thought that caused the build failures.

Thanks for the quick assistance, I'll close this issue for now.

cc --version
cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 513967
max locked memory           (kbytes, -l) 16457064
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 513967
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

The issue arose from max locked memory being to low. The ulimits I sent above were from the host but the docker container running Jenkins capped memlock at +-3mb. This caused segmentation faults on random lines so searching for the issue gets no hits.

ulimit -l 4068916

Thanks for this extra information, it's good to know the problem is now understood.