Pax multiprocessing crash ProcessBatchQueue
lucrlom opened this issue · comments
I had this problem at least in two runs 4696 and 4700 processing with massive-cax --once --run 4xxx
with pax_v6.1.0 environment:
`cax_v4.10.1 - 2016-11-24 04:15:23,038 [CRITICAL] Exception caught from task ProcessBatchQueue
Traceback (most recent call last):
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/cax-4.10.1-py3.4.egg/cax/main.py", line 120, in main
task.go(args.run)
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/cax-4.10.1-py3.4.egg/cax/task.py", line 65, in go
self.each_run()
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/cax-4.10.1-py3.4.egg/cax/tasks/process.py", line 211, in each_run
ncpus)
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/cax-4.10.1-py3.4.egg/cax/tasks/process.py", line 98, in _process
parallel.multiprocess_locally(n_cpus=ncpus, **pax_kwargs)
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/parallel.py", line 205, in multiprocess_locally
traceback)
pax.exceptions.EventBlockHeapSizeExceededException: Pax multiprocessing crashed due to exception in one of the workers. Dumping traceback:
Traceback (most recent call last):
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/parallel.py", line 356, in safe_processor
Processor(**kwargs)
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/core.py", line 186, in init
self.run()
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/core.py", line 310, in run
total=self.number_of_events)):
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/plugins/io/Queues.py", line 97, in get_events
self.max_blocks_on_heap, block_id + 1))
pax.exceptions.EventBlockHeapSizeExceededException: We have received over 250 blocks without receiving the next block id (1739) in order. Likely one of the block producers has died without telling anyone.
cax_v4.10.1 - 2016-11-24 04:15:23,149 [ERROR] Pax multiprocessing crashed due to exception in one of the workers. Dumping traceback:
Traceback (most recent call last):
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/parallel.py", line 356, in safe_processor
Processor(**kwargs)
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/core.py", line 186, in init
self.run()
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/core.py", line 310, in run
total=self.number_of_events)):
File "/project/lgrandi/anaconda3/envs/pax_v6.1.0/lib/python3.4/site-packages/pax-6.1.0-py3.4.egg/pax/plugins/io/Queues.py", line 97, in get_events
self.max_blocks_on_heap, block_id + 1))
pax.exceptions.EventBlockHeapSizeExceededException: We have received over 250 blocks without receiving the next block id (1739) in order. Likely one of the block producers has died without telling anyone`
This issue was moved to XENON1T/pax#463