OLS with collinearity and weights
dwinkler1 opened this issue · comments
Daniel commented
I am having a problem fitting an OLS model with weights and collinearity using GLM.jl.
Here is a simple example
y = rand(10)
x = rand(10, 2)
x = hcat(x,x)
lm(x,y) # correctly removes last two columns
lm(x, y, wts = ones(10)) # fails at cholesky! in GLM.delbeta!
lm(x, y, wts = ones(10), dropcollinear = true) # # fails at cholesky! in GLM.delbeta!
I further narrowed this down to the following function call:
linmod = LinearModel(GLM.LmResp(y,ones(10)), GLM.cholpred(x, true))
GLM.delbeta!(linmod.pp, linmod.rr.y, linmod.rr.wts) # fails
GLM.delbeta!(linmod.pp, linmod.rr.y) # works
using
@lessGLM.delbeta!(linmod.pp, linmod.rr.y, linmod.rr.wts)
I get
function delbeta!(p::DensePredChol{T,<:CholeskyPivoted}, r::Vector{T}, wt::Vector{T}) where T<:BlasReal
cf = cholfactors(p.chol)
piv = p.chol.p
cf .= mul!(p.scratchm2, adjoint(LinearAlgebra.mul!(p.scratchm1, Diagonal(wt), p.X)), p.X)[piv, piv]
cholesky!(Hermitian(cf, Symbol(p.chol.uplo)))
ldiv!(p.chol, mul!(p.delbeta, transpose(p.scratchm1), r))
p
end
and
@less GLM.delbeta!(linmod.pp, linmod.rr.y)
function delbeta!(p::DensePredChol{T,<:CholeskyPivoted}, r::Vector{T}) where T<:BlasReal
ch = p.chol
delbeta = mul!(p.delbeta, adjoint(p.X), r)
rnk = rank(ch)
if rnk == length(delbeta)
ldiv!(ch, delbeta)
else
permute!(delbeta, ch.p)
for k=(rnk+1):length(delbeta)
delbeta[k] = -zero(T)
end
LAPACK.potrs!(ch.uplo, view(ch.factors, 1:rnk, 1:rnk), view(delbeta, 1:rnk))
invpermute!(delbeta, ch.p)
end
p
end
Here is a naive implementation that seems to work:
using LinearAlgebra, GLM
y = rand(10)
x = rand(10, 2)
xc = hcat(x,x)
wts = [ones(5)..., 0.5 .* (ones(5))...];
T = eltype(xc)
ch = cholesky!((xc'diagm(wts)xc), Val(true), tol = -one(T), check = false)
delbeta = zeros(T, size(xc,2))
delbeta = mul!(delbeta, adjoint(xc), diagm(wts)*y)
rnk = rank(ch)
permute!(delbeta, ch.p)
for k=(rnk+1):length(delbeta)
delbeta[k] = -zero(Float64)
end
LAPACK.potrs!(ch.uplo, view(ch.factors, 1:rnk, 1:rnk), view(delbeta, 1:rnk))
invpermute!(delbeta, ch.p)
julia> delbeta[1:rnk] ≈ coef(lm(x, y, wts = wts))
true
Milan Bouchet-Valat commented
Thanks for the detailed report. Would you make a (draft) PR to show what you suggest to change exactly?
Daniel commented
Sure. It'll take me a while unfortunately. Swamped at work ATM.
Daniel commented
Sorry it took so long. My PR seems to work correctly against a naive implementation:
using LinearAlgebra, GLM, Random
Random.seed!(1)
n = 100
x = rand(n, 2)
x = hcat(x,x)
y = x * [1;2;0;0] + randn(n)
wts = 10 .*rand(n)
xi = x[:,1:2];
dwt = Diagonal(wts);
coef_naive = (xi'dwt*xi)\xi'dwt*y
coef(lm(x, y, wts = wts))
coef(lm(x, y, wts = wts))[1:2] ≈ coef_naive # true
Dave Kleinschmidt commented
I hit this too and came to the same diagnosis; I'll give your fix a try and see if it works in my case as well.