JuliaStats / Distributions.jl

Hello,

When attempting to do some distribution fitting on integer data, I ran into overflow when using suffstats and fit, my use case made me notice it for a Normal distribution but its likely to show up elsewhere.

This can be easily reproduced via:

using Distributions

samples = UInt16.(round.(randn(1_000_000) .* 100 .+ 1000));
ss = suffstats(Normal, samples)
fit_mle(Normal, ss)

yielding...

Normal{Float64}(μ=0.031291, σ=1004.9262791...)

This is due to the type stability/enforcement used in the summation here:

Distributions.jl/src/univariate/continuous/normal.jl

Lines 134 to 137 in c1705a3

    
           s = zero(T) + zero(T) 
        
           for i in eachindex(x) 
        
               @inbounds s += x[i] 
        
           end

The method accepts all subtypes of Real so Integers fall into the "expected" use-case, but it would clearly be fragile with smaller integer types. I can imagine a few scenarios to deal with this:

Leave as-is and document the potential issue, maybe suggest against Integer types
Specialize on Integer eltype and accumlate into an Int64 variable, possibly error on overflow
utilize sum from Base rather than a manual loop

All values are cast to Float64 in the returned struct, so I'm not seeing the necessity for enforcing type stability in this case.

It's very old code. I think we should just use sum here and when compute the second central moment below. It would be great if you could prepare a PR with your example as a test.

	s = zero(T) + zero(T)
	for i in eachindex(x)
	@inbounds s += x[i]
	end

`suffstats` integer overflow