How do I create a weighted model?
cmcaine opened this issue · comments
The documentation suggests that this might work:
using GLM, DataFrames
df = DataFrame(x = rand(100), y = rand(100), w = repeat([1,1,1,100], 25));
fit(LinearModel, @formula(y ~ x), df, wts = df.w)
But that gives a method not found error:
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Array{Float64,2}, ::Array{Float64,1}; wts=[1, 1, 1, 100, 1, 1, 1, 100, 1, 1 … 1, 100, 1, 1, 1, 100, 1, 1, 1, 100])
Closest candidates are:
fit(::Type{LinearModel}, ::AbstractArray{T,2} where T, ::AbstractArray{T,1} where T) at /home/colin/.julia/dev/GLM/src/lm.jl:136 got unsupported keyword argument "wts"
fit(::Type{LinearModel}, ::AbstractArray{T,2} where T, ::AbstractArray{T,1} where T, ::Bool) at /home/colin/.julia/dev/GLM/src/lm.jl:136 got unsupported keyword argument "wts"
fit(::Type{StatsBase.Histogram}, ::Any...; kwargs...) at /home/colin/.julia/packages/StatsBase/rTkaz/src/hist.jl:319
...
Stacktrace:
[1] #fit#57(::Dict{Symbol,Any}, ::Base.Iterators.Pairs{Symbol,Array{Int64,1},Tuple{Symbol},NamedTuple{(:wts,),Tuple{Array{Int64,1}}}}, ::Function, ::Type{LinearModel}, ::FormulaTerm{Term,Term}, ::DataFrame) at /home/colin/.julia/packages/StatsModels/G9zlM/src/statsmodel.jl:88
[2] (::getfield(StatsBase, Symbol("#kw##fit")))(::NamedTuple{(:wts,),Tuple{Array{Int64,1}}}, ::typeof(fit), ::Type{LinearModel}, ::FormulaTerm{Term,Term}, ::DataFrame) at ./none:0
[3] top-level scope at none:0
This looks like it might work, but this can't be the intended API:
using GLM, DataFrames
df = DataFrame(x1 = rand(100), x2 = rand(100), y = rand(100), w = repeat([1,1,1,100], 25));
m = convert(Matrix, df)
X = m[:, 1:2] # Type signature of cholpred can't handle X being a vector, probably a bug?
y = m[:, 3]
w = m[:, 4]
lr = GLM.LmResp{typeof(y)}(
fill!(similar(y), 0),
similar(y, 0),
w,
y)
fit!(LinearModel(lr, GLM.cholpred(X, false)))
Where does the docs suggest this? We should fix that until we support it.
The docstring on fit
says that it accepts weights as the wts
argument: https://juliastats.github.io/GLM.jl/stable/api/#StatsBase.fit
A closer reading shows that the initial model has to be a GeneralizedLinearModel and that LinearModel is not a subtype of that (which of course, because GLM sounds like (and is) a concrete type, but I didn't really read that bit), but it tripped me up all the same.
I don't know how to foolproof the documentation. I'll try to have a think about it. For now this issue might help people find the answer.
For anyone wanting to do this, a simple way is to just construct a GLM that is equivalent to a linear model:
glm(@formula(y ~ x), data, Normal(), IdentityLink(), wts = data.w)
I haven't confirmed yet that this does give me the weighted least squares that I wanted (I don't know enough about GLMs and it has been way too hot ;)), but that's my problem.
The docstring on
fit
says that it accepts weights as thewts
argument: https://juliastats.github.io/GLM.jl/stable/api/#StatsBase.fitA closer reading shows that the initial model has to be a GeneralizedLinearModel and that LinearModel is not a subtype of that (which of course, because GLM sounds like (and is) a concrete type, but I didn't really read that bit), but it tripped me up all the same.
I don't know how to foolproof the documentation. I'll try to have a think about it. For now this issue might help people find the answer.
A separate docstring for fit
on LinearModel
should probably be added.
For anyone wanting to do this, a simple way is to just construct a GLM that is equivalent to a linear model:
glm(@formula(y ~ x), data, Normal(), IdentityLink(), wts = data.w)
I haven't confirmed yet that this does give me the weighted least squares that I wanted (I don't know enough about GLMs and it has been way too hot ;)), but that's my problem.
Be very careful with this: weights will be interpreted as case/frequency weights, which differs e.g. from the R behavior.