JuliaStats / GLM.jl

Generalized linear models in Julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About `dof_residual`

gragusa opened this issue Β· comments

I noticed something odd about dof_residual. In few words, when the family does not have a dispersion_parameter, dof_residual is bumped by 1which is inconsistent with other software (R, Stata, etc.) and with statistics more generally πŸ˜„.

using DataFrames
using StableRNGs
using RCall

rng = StableRNG(123);

x1 = rand(rng, 25);
x2 = ifelse.(randn(rng, 25) .> 0, 1, 0);
y = ifelse.(0.004 .- 0.01 .* x1 .+ 1.5 .* x2 .+ randn(rng, 25) .> 0, 1, 0);
df = DataFrame(y=y, x1=x1, x2=x2);

## Julia
l = glm(@formula(y~x1+x2), df, Binomial());

## R
@rput df;
R"rl <- glm(y~x1+x2, df, family='binomial')";
@rget rl;

println("Julia dof_residual:", dof_residual(l))
println("R dof_residual: ", rl[:df_residual])

which gives

Julia dof_residual: 23
R dof_residual: 22

The families without a dispersion_parameter are Bernoulli, Binomial, Poisson:

dispersion_parameter(D) = true
dispersion_parameter(::Union{Bernoulli, Binomial, Poisson}) = false

Fortunately, dof_residuals is not used to scale the vcov and so this bug does not have important ramifications. I also noted that we are not testing dof_residual for these families.

The fix is relatively simple: change

dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + 1

to

dof_residual(obj::LinPredModel) = nobs(obj) - dof(obj) + dispersion_parameter(obj.rr.d)

Good catch. Actually this definition seems to have been there forever, though the implementation has been changed recently by #265. We should probably add a docstring to explain more precisely what this method returns as it's not necessarily obvious for users how the dispersion parameter must be handled (R doesn't say anything).

Cc: @andreasnoack @Nosferican