struct.error and OverflowError

Question

struct.error and OverflowError

minhlab opened this issue 7 years ago · comments

I am running cort-train when these errors happen. My setup is Ubuntu 16.04.3, 64G RAM, 4 CPUs.

Process ForkPoolWorker-9:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
Process ForkPoolWorker-10:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB
Process ForkPoolWorker-11:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 125, in worker
    put((job, i, result))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 355, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 130, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 349, in put
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a string larger than 4GiB

Sebastian Martschat · Answer 1 · Thu Nov 02 2017 03:39:36 GMT+0800 (China Standard Time)

This is a multiprocessing error which happens when too much data is passed around. I sometimes encountered it when experimenting with larger feature sets. A quick solution would be to disable multiprocessing. This, however, would vastly increase running time for feature extraction. Is this an option for you?

A more principled solution is a rewrite of the feature extraction code allowing for more efficient feature extraction/combination, but this will take some time.

Minh Le · Answer 2 · Thu Nov 02 2017 06:00:34 GMT+0800 (China Standard Time)

Hi @smartschat, thanks for answering. For some reason, the error doesn't occur when I run on a different machine: CentOS 7.2.1511, 62G RAM, 32 CPUs. Do you know why the difference?

BTW, is it the size of features of one document that exceeds 4GiB or is it an aggregation of multiple documents?

Sebastian Martschat · Answer 3 · Mon Nov 06 2017 19:28:36 GMT+0800 (China Standard Time)

Unfortunately I do not know why there is a different behavior for these settings. :/

It's the size of all the features for an aggregation of documents. As far as I understood, the following happens: Python divides the data for multiprocessing into chunks, which then get processed. The results for the chunks are passed around using Python's pickle mechanism. If the results for the chunks are too big, the error occurs.