`Int` conversion in `loglik_obs(::Binomial)`
palday opened this issue · comments
Currently, loglik_obs(::Binomial)
forces the second argument to logpdf(::Binomial, x)
to be be an integer:
Line 515 in 1459737
(Sidebar: The PDF function will actually accept non integer values, returning -Inf if they're not within 1eps of an integer or the value of the corresponding integer if they are.)
As part of an analysis at the day job, I found a pathological case where we currently get an InexactError
:
julia> using BenchmarkTools
julia> using Distributions
# values from a fitted model
julia> y, μ, wt, ϕ = 0.6376811594202898, 0.8492925285671102, 69.0, NaN # phi isn't used here
(0.6376811594202898, 0.8492925285671102, 69.0, NaN)
julia> # definition from GLM.jl
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
loglik_obs (generic function with 1 method)
julia> loglik_obs(Binomial(), y, μ, wt, ϕ) # error
ERROR: InexactError: Int64(43.99999999999999)
Stacktrace:
[1] Int64
@ ./float.jl:788 [inlined]
[2] loglik_obs(#unused#::Binomial{Float64}, y::Float64, μ::Float64, wt::Float64, ϕ::Float64)
@ Main ./REPL[76]:2
[3] top-level scope
@ REPL[77]:1
If we look at the numbers creating the problem, we have
julia> y * wt
43.99999999999999
julia> nextfloat(y * wt)
44.0
julia> 44 / wt == y
true
julia> 44 / y == wt
true
So this really is just a floating point issue. We can fix this by defining a safer_int
for "close enough"
function safer_int(x::T) where {T<:Base.IEEEFloat}
r = round(Int, x)
abs(x - r) <= eps(x) && return r
throw(InexactError(:safer_int, T, x))
end
loglik_obs_safer(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), safer_int(y*wt))
and then everything works:
julia> loglik_obs_safer(Binomial(), y, μ, wt, ϕ) # works
-11.628163156011077
What happens to performance though for a non pathological case? Let's take a look
julia> y, μ, wt, ϕ = 1/3 , 0.5, 3, NaN
(0.3333333333333333, 0.5, 3, NaN)
julia> @benchmark loglik_obs(Binomial(), $y, $μ, $wt, $ϕ)
BenchmarkTools.Trial: 10000 samples with 987 evaluations.
Range (min … max): 50.616 ns … 86.458 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 50.785 ns ┊ GC (median): 0.00%
Time (mean ± σ): 51.102 ns ± 1.389 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆█▄ ▁ ▁ ▁
▆████▆▃▄▃▃▃▂▅▅▅▃▅▄▄▄▄▅▄▅██▆▇▇▇▆▅▄▃▅▅▆█▇▅▆▆▅▅▅▅▅▅▆▇▆▆▅▅▆▅▆▅▅ █
50.6 ns Histogram: log(frequency) by time 55.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark loglik_obs_safer(Binomial(), $y, $μ, $wt, $ϕ)
BenchmarkTools.Trial: 10000 samples with 986 evaluations.
Range (min … max): 53.414 ns … 88.446 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 53.542 ns ┊ GC (median): 0.00%
Time (mean ± σ): 53.847 ns ± 1.313 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂█▇▃ ▁▂▁ ▂▂ ▂
████▆▇▅▄▄▃▄▅▄▅▄▅▄▄▄▅▃▆▅▃▃▄███████▇▆▆▇██▅▄▄▅▅▃▅▅▅▃▁▄▄▅▅▅▄▅▆▆ █
53.4 ns Histogram: log(frequency) by time 57.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
So only about 2-3ns worse.
Tested on Apple silicon:
julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 4 on 4 virtual cores
@nalimilan if you have no strong objections, I would just go ahead and open a PR to add this.
Hat-tip @ararslan who rubber-ducked this with me and @haberdashPI who found the pathological case.