binom_test MemoryError
pdeperio opened this issue · comments
Discovered by @lucrlom processing 170316_0210
(event number?):
Traceback (most recent call last):
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/bin/cax", line 9, in <module>
load_entry_point('cax==5.0.12', 'console_scripts', 'cax')()
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/cax-5.0.12-py3.4.egg/cax/main.py", line 142, in main
task.go(args.run)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/cax-5.0.12-py3.4.egg/cax/task.py", line 65, in go
self.each_run()
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/cax-5.0.12-py3.4.egg/cax/tasks/process.py", line 217, in each_run
ncpus)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/cax-5.0.12-py3.4.egg/cax/tasks/process.py", line 104, in _process
core.Processor(**pax_kwargs).run()
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/pax-6.6.2-py3.4.egg/pax/core.py", line 315, in run
self.process_event(event)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/pax-6.6.2-py3.4.egg/pax/core.py", line 276, in process_event
event = plugin.process_event(event)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/pax-6.6.2-py3.4.egg/pax/plugin.py", line 91, in process_event
event = self._process_event(event)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/pax-6.6.2-py3.4.egg/pax/plugin.py", line 108, in _process_event
return self.transform_event(event)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/pax-6.6.2-py3.4.egg/pax/plugins/interaction_processing/S1AreaFractionTopProbability.py", line 34, in transform_event
ia.s1_area_fraction_top_probability = binom_test(size_top, size_tot, aft)
File "/project/lgrandi/anaconda3/envs/pax_v6.6.2/lib/python3.4/site-packages/scipy/stats/morestats.py", line 2050, in binom_test
i = np.arange(np.floor(p*n) + 1)
MemoryError
Probably a very large S1 overloading the combinatorial. See http://stackoverflow.com/questions/3056179/binomial-test-in-python-for-very-large-numbers and https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation for possible solutions.
I suspect the p value isn't very good at high energies, eventually the statistical error becomes so small any remaining systematic (e.g. small error in map) will dominate. We could consider skipping the test for very high energy peaks, or probably better, clipping the n in this test to some large number.
I still need to look into this, but my guess is that once we start hitting massive saturation the calculation is inaccurate. I'm also not entirely sure above what energy the p-value stops being useful. We could artificially cap it at ~1e4 pe or something.
ok. can you implement a fix soon @darrylmasson? i think this may be killing all our MC jobs currently.
From what the code seems to say, it's doing some fairly memory-inefficient operations that a large s1 would kill. The p-value isn't particularly useful for alpha studies (or higher energy), so until I get around to implementing all the floating-point operation overloads for this I'll cap it at 1e4 pe. There are ways to rewrite these to not chew all the memory (such as using the normal approximation for high energies) I'll work on implementing.
Closed in #558