JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`suffstats` integer overflow

scoopxyz opened this issue · comments

Hello,

When attempting to do some distribution fitting on integer data, I ran into overflow when using suffstats and fit, my use case made me notice it for a Normal distribution but its likely to show up elsewhere.

This can be easily reproduced via:

using Distributions

samples = UInt16.(round.(randn(1_000_000) .* 100 .+ 1000));
ss = suffstats(Normal, samples)
fit_mle(Normal, ss)

yielding...

Normal{Float64}(μ=0.031291, σ=1004.9262791...)

This is due to the type stability/enforcement used in the summation here:

s = zero(T) + zero(T)
for i in eachindex(x)
@inbounds s += x[i]
end

The method accepts all subtypes of Real so Integers fall into the "expected" use-case, but it would clearly be fragile with smaller integer types. I can imagine a few scenarios to deal with this:

  1. Leave as-is and document the potential issue, maybe suggest against Integer types
  2. Specialize on Integer eltype and accumlate into an Int64 variable, possibly error on overflow
  3. utilize sum from Base rather than a manual loop

All values are cast to Float64 in the returned struct, so I'm not seeing the necessity for enforcing type stability in this case.

It's very old code. I think we should just use sum here and when compute the second central moment below. It would be great if you could prepare a PR with your example as a test.