JuliaStats / GLM.jl

Generalized linear models in Julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OLS with collinearity and weights

dwinkler1 opened this issue · comments

I am having a problem fitting an OLS model with weights and collinearity using GLM.jl.
Here is a simple example

y = rand(10)
x = rand(10, 2)
x = hcat(x,x)
lm(x,y) # correctly removes last two columns
lm(x, y, wts = ones(10)) # fails at cholesky! in GLM.delbeta!
lm(x, y, wts = ones(10), dropcollinear = true) # # fails at cholesky! in GLM.delbeta!

I further narrowed this down to the following function call:

linmod = LinearModel(GLM.LmResp(y,ones(10)), GLM.cholpred(x, true))
GLM.delbeta!(linmod.pp, linmod.rr.y, linmod.rr.wts) # fails
GLM.delbeta!(linmod.pp, linmod.rr.y) # works

using

@lessGLM.delbeta!(linmod.pp, linmod.rr.y, linmod.rr.wts)

I get

function delbeta!(p::DensePredChol{T,<:CholeskyPivoted}, r::Vector{T}, wt::Vector{T}) where T<:BlasReal
    cf = cholfactors(p.chol)
    piv = p.chol.p
    cf .= mul!(p.scratchm2, adjoint(LinearAlgebra.mul!(p.scratchm1, Diagonal(wt), p.X)), p.X)[piv, piv]
    cholesky!(Hermitian(cf, Symbol(p.chol.uplo)))
    ldiv!(p.chol, mul!(p.delbeta, transpose(p.scratchm1), r))
    p
end

and

@less GLM.delbeta!(linmod.pp, linmod.rr.y)
function delbeta!(p::DensePredChol{T,<:CholeskyPivoted}, r::Vector{T}) where T<:BlasReal
    ch = p.chol
    delbeta = mul!(p.delbeta, adjoint(p.X), r)
    rnk = rank(ch)
    if rnk == length(delbeta)
        ldiv!(ch, delbeta)
    else
        permute!(delbeta, ch.p)
        for k=(rnk+1):length(delbeta)
            delbeta[k] = -zero(T)
        end
        LAPACK.potrs!(ch.uplo, view(ch.factors, 1:rnk, 1:rnk), view(delbeta, 1:rnk))
        invpermute!(delbeta, ch.p)
    end
    p
end

Here is a naive implementation that seems to work:

using LinearAlgebra, GLM
y = rand(10)
x = rand(10, 2)
xc = hcat(x,x)
wts = [ones(5)..., 0.5 .* (ones(5))...];
T = eltype(xc)
ch = cholesky!((xc'diagm(wts)xc), Val(true), tol = -one(T), check = false) 
delbeta = zeros(T, size(xc,2))
delbeta = mul!(delbeta, adjoint(xc), diagm(wts)*y)
rnk = rank(ch)
permute!(delbeta, ch.p)
for k=(rnk+1):length(delbeta)
    delbeta[k] = -zero(Float64)
end
LAPACK.potrs!(ch.uplo, view(ch.factors, 1:rnk, 1:rnk), view(delbeta, 1:rnk))
invpermute!(delbeta, ch.p)
julia> delbeta[1:rnk]  coef(lm(x, y, wts = wts))
true

Thanks for the detailed report. Would you make a (draft) PR to show what you suggest to change exactly?

Sure. It'll take me a while unfortunately. Swamped at work ATM.

Sorry it took so long. My PR seems to work correctly against a naive implementation:

using LinearAlgebra, GLM, Random
Random.seed!(1)
n = 100
x = rand(n, 2)
x = hcat(x,x)
y = x * [1;2;0;0] + randn(n)
wts = 10 .*rand(n)
xi = x[:,1:2];
dwt = Diagonal(wts);
coef_naive = (xi'dwt*xi)\xi'dwt*y

coef(lm(x, y, wts = wts))
coef(lm(x, y, wts = wts))[1:2] ≈ coef_naive # true

I hit this too and came to the same diagnosis; I'll give your fix a try and see if it works in my case as well.