pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF

Home Page:https://pdfminersix.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pdfminer cause python3 process coredump....

xsank opened this issue · comments

Bug report
version
pdfminer.six==20220524

When multi-process use pdfplumber to open pdf. Will cause the problem below:
pdfminer code:
image
trace_back:

raceback (most recent call first):
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfminer/psparser.py", line 257, in nextline
    m = EOL.search(self.buf, self.charpos)
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfminer/pdfdocument.py", line 170, in load
    (_, line) = parser.nextline()
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfminer/pdfdocument.py", line 1005, in read_xref_from
    xref.load(parser)
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfminer/pdfdocument.py", line 722, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfplumber/pdf.py", line 44, in __init__
    self.doc = PDFDocument(PDFParser(stream), password=password or "")
  File "/home/admin/test_parse/lib/python3.9/site-packages/pdfplumber/pdf.py", line 94, in open
    return cls(
  File "/home/admin/test_parse/core/parsers/pdf/image_parser.py", line 38, in parse_image_by_pdfplumber
    doc = pdfplumber.open(pdf)

core dump:

(gdb) bt
#0  0x00007fb3a2374605 in raise () from /usr/lib64/libc.so.6
#1  0x00007fb3a235d8a2 in abort () from /usr/lib64/libc.so.6
#2  0x00007fb3a23b65c8 in __libc_message () from /usr/lib64/libc.so.6
#3  0x00007fb3a23be25a in malloc_printerr () from /usr/lib64/libc.so.6
#4  0x00007fb3a23c1874 in _int_malloc () from /usr/lib64/libc.so.6
#5  0x00007fb3a23c2e31 in malloc () from /usr/lib64/libc.so.6
#6  0x00000000004d303b in _PyObject_Malloc (ctx=<optimized out>, nbytes=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Objects/obmalloc.c:1645
#7  _PyObject_Malloc (ctx=<optimized out>, nbytes=nbytes@entry=1114)
    at /usr/local/src/conda/python-3.9.18/Objects/obmalloc.c:1638
#8  0x0000000000561ce5 in _PyObject_Realloc (nbytes=1114, ptr=0x0, ctx=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Objects/obmalloc.c:2004
#9  PyMem_Realloc (new_size=1114, ptr=0x0) at /usr/local/src/conda/python-3.9.18/Objects/obmalloc.c:623
#10 data_stack_grow (size=72, state=0x7ffdd46bbb80) at /usr/local/src/conda/python-3.9.18/Modules/_sre.c:215
#11 sre_ucs1_match (state=0x7ffdd46bbb80, pattern=0x7fb390fa7d10, toplevel=0)
    at /usr/local/src/conda/python-3.9.18/Modules/sre_lib.h:565
#12 0x000000000056fa72 in sre_ucs1_search (pattern=0x7fb390fa7d10, state=0x7ffdd46bbb80)
    at /usr/local/src/conda/python-3.9.18/Modules/sre_lib.h:1518
#13 sre_search (state=0x7ffdd46bbb80, pattern=0x7fb390fa7ce8)
    at /usr/local/src/conda/python-3.9.18/Modules/_sre.c:575
#14 0x00000000005d8a6f in _sre_SRE_Pattern_search_impl (endpos=<optimized out>, pos=<optimized out>,
    string=0x5d5b97f0, self=0x7fb390fa7c90) at /usr/local/src/conda/python-3.9.18/Modules/_sre.c:687
#15 _sre_SRE_Pattern_search (self=0x7fb390fa7c90, args=<optimized out>, args@entry=0x7fb1f1f66788,
    nargs=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Modules/clinic/_sre.c.h:413
#16 0x000000000050e388 in method_vectorcall_FASTCALL_KEYWORDS (func=<optimized out>, args=0x7fb1f1f66780,
    nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Objects/descrobject.c:409
#17 0x00000000004e80b5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>,
    args=0x7fb1f1f66780, callable=0x7fb39bb69360, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#18 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fb1f1f66780, callable=0x7fb39bb69360)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#19 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#20 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fb1f1f665e0, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3506
#21 0x00000000004f8123 in _PyEval_EvalFrame (throwflag=0, f=0x7fb1f1f665e0, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#22 function_code_fastcall (tstate=0x771440, co=<optimized out>, args=<optimized out>,
    nargs=<optimized out>, globals=0x7fb390ff9a00) at /usr/local/src/conda/python-3.9.18/Objects/call.c:330
#23 0x00000000004e80b5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xcadffa0,
    callable=0x7fb390faa820, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#24 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xcadffa0, callable=0x7fb390faa820)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#25 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#26 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xcadfdc0, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3506
#27 0x00000000004f8123 in _PyEval_EvalFrame (throwflag=0, f=0xcadfdc0, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#28 function_code_fastcall (tstate=0x771440, co=<optimized out>, args=<optimized out>,
    nargs=<optimized out>, globals=0x7fb390ec3e80) at /usr/local/src/conda/python-3.9.18/Objects/call.c:330
#29 0x00000000004e80b5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x674add80,
    callable=0x7fb390e5ec10, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#30 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x674add80, callable=0x7fb390e5ec10)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#31 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#32 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x674adbd0, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3506
#33 0x00000000004f8123 in _PyEval_EvalFrame (throwflag=0, f=0x674adbd0, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#34 function_code_fastcall (tstate=0x771440, co=<optimized out>, args=<optimized out>,
    nargs=<optimized out>, globals=0x7fb390ec3e80) at /usr/local/src/conda/python-3.9.18/Objects/call.c:330
#35 0x00000000004e80b5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x4b2c0a50,
    callable=0x7fb390e6caf0, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x4b2c0a50, callable=0x7fb390e6caf0)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#37 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#38 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x4b2c0890, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3506
#39 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x4b2c0890, tstate=0x771440)
    at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#40 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x7fb1d25ac5f8,
    kwargs=0x7fb1d25b3b88, kwcount=<optimized out>, kwstep=1, defs=0x7fb390e55f58,
    defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fb39bd6f530, qualname=0x7fb390e53b70)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329

pdfplumber cause this, i have solved...