getzze / RobustModels.jl

A Julia package for robust regressions using M-estimators and quantile regressions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hung Process

nwamsley1 opened this issue · comments

I have an application where I am fitting many (thousands) of MMEstimators. In one case I come by the following data and rlm runs forever. There is no error message, but It will never stop running. This result is specific to certain inputs, but I cannot find out what about these inputs causes the problem and how to deal with the problem so that my program can continue to run. Here is the example

X = [0.0; 0.0; 593.3040161132812; 680.9676513671875; 533.0647583007812; 742.5764770507812; 835.7925415039062; 1277.613525390625; 465.0248718261719; 977.941162109375; 453.80657958984375; 524.1534423828125; 400.8550109863281; 1025.7659912109375; 3729.7734375; 7977.93408203125; 33058.66796875; 58342.8359375; 96970.9765625; 125303.8515625; 105264.4453125; 68260.9375; 40450.44921875; 27465.583984375; 12540.0400390625; 10328.353515625;;]

y = [170845.453125, 373183.40625, 489773.0625, 640513.0, 896556.25, 894648.0625, 1.0691845e6, 1.056674e6, 1.2729035e6, 1.0171798125e6, 937198.375, 593592.5625, 694190.0625, 0.0, 19976.91796875, 0.0, 0.0, 32533.732421875, 0.0, 42338.94140625, 47968.13671875, 54009.5546875, 40316.9609375, 40895.29296875, 0.0, 33167.5078125]

I = (X[:,1].!=0.0) .& (y.!=0.0)

rlm(X[I,:]./mean(y),y[I]./mean(y), MMEstimator{TukeyLoss}(), initial_scale=:mad)

I can fit a simple linear model to the data easily

julia> X[I,1]\y[I]
0.6911979570792081

plot(X[I,:], y[I], seriestype = :scatter)

The "I" is so that we are considering only cases where by X and y are non-zero. I am wondering why the process hangs and what I could do to prevent this or at least skip it if it runs for too long.

A couple observations:

  1. This issue is specific to certain inputs. The error is reproducible on my machine with these inputs.
  2. If I use an M-estimator "MEstimator{TukeyLoss}()" or a TauEstimator{TukeyLoss}(), then it works. But if I use an SEstimator{TukeyLoss}() it also fails.
  3. This is a case where the data will fit the model terribly. That is OK but it may have something to do with why the process runs forever.

Also, great package. It's easy to use is helping my own project along.

commented

Thanks for catching this bug, it's an infinite loop created in the linesearch part of S-estimation and τ-estimation.
PR #33 will fix it.

In the mean time, set miniter=0 in the keyword arguments of rlm to avoid the infinite loop.
For the example you gave, you can increase rtol to avoid a convergence error (the default is rtol=1e-5).
rtol=1e-4 worked for example.