About `dof_residual`
gragusa opened this issue Β· comments
I noticed something odd about dof_residual
. In few words, when the family
does not have a dispersion_parameter
, dof_residual
is bumped by 1which is inconsistent with other software (R, Stata, etc.) and with statistics more generally π.
using DataFrames
using StableRNGs
using RCall
rng = StableRNG(123);
x1 = rand(rng, 25);
x2 = ifelse.(randn(rng, 25) .> 0, 1, 0);
y = ifelse.(0.004 .- 0.01 .* x1 .+ 1.5 .* x2 .+ randn(rng, 25) .> 0, 1, 0);
df = DataFrame(y=y, x1=x1, x2=x2);
## Julia
l = glm(@formula(y~x1+x2), df, Binomial());
## R
@rput df;
R"rl <- glm(y~x1+x2, df, family='binomial')";
@rget rl;
println("Julia dof_residual:", dof_residual(l))
println("R dof_residual: ", rl[:df_residual])
which gives
Julia dof_residual: 23
R dof_residual: 22
The families without a dispersion_parameter
are Bernoulli, Binomial, Poisson
:
dispersion_parameter(D) = true
dispersion_parameter(::Union{Bernoulli, Binomial, Poisson}) = false
Fortunately, dof_residuals
is not used to scale the vcov
and so this bug does not have important ramifications. I also noted that we are not testing dof_residual
for these families.
The fix is relatively simple: change
dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + 1
to
dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + dispersion_parameter(obj.rr.d)
Good catch. Actually this definition seems to have been there forever, though the implementation has been changed recently by #265. We should probably add a docstring to explain more precisely what this method returns as it's not necessarily obvious for users how the dispersion parameter must be handled (R doesn't say anything).