`suffstats` integer overflow
scoopxyz opened this issue · comments
Hello,
When attempting to do some distribution fitting on integer data, I ran into overflow when using suffstats
and fit
, my use case made me notice it for a Normal
distribution but its likely to show up elsewhere.
This can be easily reproduced via:
using Distributions
samples = UInt16.(round.(randn(1_000_000) .* 100 .+ 1000));
ss = suffstats(Normal, samples)
fit_mle(Normal, ss)
yielding...
Normal{Float64}(μ=0.031291, σ=1004.9262791...)
This is due to the type stability/enforcement used in the summation here:
Distributions.jl/src/univariate/continuous/normal.jl
Lines 134 to 137 in c1705a3
The method accepts all subtypes of Real
so Integer
s fall into the "expected" use-case, but it would clearly be fragile with smaller integer types. I can imagine a few scenarios to deal with this:
- Leave as-is and document the potential issue, maybe suggest against Integer types
- Specialize on
Integer
eltype and accumlate into an Int64 variable, possibly error on overflow - utilize
sum
from Base rather than a manual loop
All values are cast to Float64 in the returned struct, so I'm not seeing the necessity for enforcing type stability in this case.
It's very old code. I think we should just use sum
here and when compute the second central moment below. It would be great if you could prepare a PR with your example as a test.