Attempting to recover from serious numerical issues...

Question

Attempting to recover from serious numerical issues...

Witomi opened this issue a year ago · comments

Hi!
I'm running an optimization problem and getting "Warning: Attempting to recover from serious numerical issues...". I did as much as I could to avoid having problems with the stability report, which now prints "no problems detected", but when iterating, I get this warning that pretty much guarantees a frozen state. Is there any way to get a hint about what variable or constraint is generating this?

Tomás Riquelme · Answer 1 · Mon Jun 19 2023 22:24:42 GMT+0800 (China Standard Time)

It is not capable to iterate even once!

Oscar Dowson · Answer 2 · Tue Jun 20 2023 03:27:45 GMT+0800 (China Standard Time)

Can you provide a reproducible example?

If you're using GLPK or HiGHS, try Gurobi instead.

See https://odow.github.io/SDDP.jl/stable/guides/improve_computational_performance/#Numerical-stability-(again)

Tomás Riquelme · Answer 3 · Tue Jun 20 2023 05:13:59 GMT+0800 (China Standard Time)

I don't think so, since my model is too big to manage with ease the data. It's a hydrothermal scheduling problem of a big-scale grid. I am using indeed Gurobi, now I'm just trying to find the bug by transforming the problem to .mps format and looking for solving just one stage of the problem.

Oscar Dowson · Answer 4 · Tue Jun 20 2023 05:33:23 GMT+0800 (China Standard Time)

So does the error eventually say Termination status: INFEASIBLE or something to that effect?

Can you provide the output of the SDDP log?

Can you provide the subproblem file that SDDP.jl wrote out?

Oscar Dowson · Answer 5 · Tue Jun 20 2023 06:33:49 GMT+0800 (China Standard Time)

The error can also be because of https://odow.github.io/SDDP.jl/stable/tutorial/warnings/#Relatively-complete-recourse.

It's confusing, so I'm removing the warning and updating the error: #627

Tomás Riquelme · Answer 6 · Tue Jun 20 2023 12:38:42 GMT+0800 (China Standard Time)

Here is a screenshot, since it never ended to print the SDDP log. Not showing any hint of what is generating it. Also is attached the subproblem.

subproblem_26.txt

Oscar Dowson · Answer 7 · Tue Jun 20 2023 15:14:31 GMT+0800 (China Standard Time)

Some tips for debugging:

What happens if you don't use parallel?
What happens if you get rid of the uncertainty? Just use the first realization of each random variable.
What happens if you update to the latest version of SDDP.jl?

Your subproblem is infeasible, so take a read of the relatively complete recourse documentation linked above. My usual approach is to comment out constraints and then re-add them one-by-one until you find what is making things infeasible.

julia> using JuMP, Gurobi

julia> m = read_from_file("/tmp/subproblem_26.mof.json");

julia> m
A JuMP Model
Minimization problem with:
Variables: 5055
Objective function type: AffExpr
`AffExpr`-in-`MathOptInterface.EqualTo{Float64}`: 2030 constraints
`AffExpr`-in-`MathOptInterface.GreaterThan{Float64}`: 10 constraints
`AffExpr`-in-`MathOptInterface.LessThan{Float64}`: 45 constraints
`AffExpr`-in-`MathOptInterface.Interval{Float64}`: 2145 constraints
`VariableRef`-in-`MathOptInterface.EqualTo{Float64}`: 175 constraints
`VariableRef`-in-`MathOptInterface.GreaterThan{Float64}`: 4650 constraints
`VariableRef`-in-`MathOptInterface.LessThan{Float64}`: 1 constraint
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> set_optimizer(m, Gurobi.Optimizer)

julia> optimize!(m)
Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[x86])

CPU model: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 4230 rows, 7200 columns and 12126 nonzeros
Model fingerprint: 0xeca4d4f7
Coefficient statistics:
  Matrix range     [1e-02, 1e+03]
  Objective range  [1e+00, 1e+06]
  Bounds range     [5e-07, 1e+04]
  RHS range        [3e-02, 4e+05]
Presolve removed 4110 rows and 6486 columns
Presolve time: 0.00s

Solved in 0 iterations and 0.01 seconds (0.00 work units)
Infeasible model

User-callback calls 40, time in user-callback 0.00 sec

Tomás Riquelme · Answer 8 · Tue Jun 20 2023 23:42:28 GMT+0800 (China Standard Time)

I'm going to try the things you said, but I still find it weird that it turns infeasible, since the problem is feasible if I don't consider hydro generators that are down stream of reservoirs, but if I consider just one, then the problem prints this warning

Oscar Dowson · Answer 9 · Wed Jun 21 2023 04:56:57 GMT+0800 (China Standard Time)

You're likely hitting upper or lower bounds on their capacities? Do you have spill? Is there enough thermal generation to meet demand even if all reservoirs are empty?

The problem might have a feasible solution in the normal case of things working. But SDDP.jl requires there to be a feasible solution for all possible values of the state variables, even if they might never happen in practice.

Oscar Dowson · Answer 10 · Tue Jun 27 2023 06:46:18 GMT+0800 (China Standard Time)

Any update? If not, I will close this issue.

Tomás Riquelme · Answer 11 · Tue Jun 27 2023 07:13:51 GMT+0800 (China Standard Time)

Sorry for the late response. Ido have spills and enough thermal generation to meet demand while having all reservoirs empty. I think it might just be the nature of the problem (too big) that can lead to this warning. Thanks for your help but you can close this issue if you want, I don't think I will get any hints very soon. if I get to catch what is happening here I'll let you know or try to reopen this issue.

Oscar Dowson · Answer 12 · Tue Jun 27 2023 07:47:41 GMT+0800 (China Standard Time)

Did you take a look at why the problem was infeasible? It seems like you have a problem with the rest_1r constraints:

julia> using JuMP, Gurobi

julia> model = read_from_file("/tmp/subproblem_26.mof.json")
A JuMP Model
Minimization problem with:
Variables: 5055
Objective function type: AffExpr
`AffExpr`-in-`MathOptInterface.EqualTo{Float64}`: 2030 constraints
`AffExpr`-in-`MathOptInterface.GreaterThan{Float64}`: 10 constraints
`AffExpr`-in-`MathOptInterface.LessThan{Float64}`: 45 constraints
`AffExpr`-in-`MathOptInterface.Interval{Float64}`: 2145 constraints
`VariableRef`-in-`MathOptInterface.EqualTo{Float64}`: 175 constraints
`VariableRef`-in-`MathOptInterface.GreaterThan{Float64}`: 4650 constraints
`VariableRef`-in-`MathOptInterface.LessThan{Float64}`: 1 constraint
Model mode: AUTOMATIC
CachingOptimizer state: NO_OPTIMIZER
Solver name: No optimizer attached.

julia> set_optimizer(model, Gurobi.Optimizer)

julia> optimize!(model)
Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[x86])

CPU model: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 4230 rows, 7200 columns and 12126 nonzeros
Model fingerprint: 0xeca4d4f7
Coefficient statistics:
  Matrix range     [1e-02, 1e+03]
  Objective range  [1e+00, 1e+06]
  Bounds range     [5e-07, 1e+04]
  RHS range        [3e-02, 4e+05]
Presolve removed 4110 rows and 6486 columns
Presolve time: 0.00s

Solved in 0 iterations and 0.00 seconds (0.00 work units)
Infeasible model

User-callback calls 40, time in user-callback 0.00 sec

julia> p = relax_with_penalty!(model);
┌ Warning: Skipping PenaltyRelaxation for ConstraintIndex{MathOptInterface.VariableIndex,MathOptInterface.EqualTo{Float64}}
└ @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/BlCD1/src/Utilities/penalty_relaxation.jl:289
┌ Warning: Skipping PenaltyRelaxation for ConstraintIndex{MathOptInterface.VariableIndex,MathOptInterface.GreaterThan{Float64}}
└ @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/BlCD1/src/Utilities/penalty_relaxation.jl:289
┌ Warning: Skipping PenaltyRelaxation for ConstraintIndex{MathOptInterface.VariableIndex,MathOptInterface.LessThan{Float64}}
└ @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/BlCD1/src/Utilities/penalty_relaxation.jl:289

julia> optimize!(model)
Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (mac64[x86])

CPU model: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 4230 rows, 15605 columns and 20531 nonzeros
Model fingerprint: 0x7f6657fa
Coefficient statistics:
  Matrix range     [1e-02, 1e+03]
  Objective range  [1e+00, 1e+06]
  Bounds range     [5e-07, 1e+04]
  RHS range        [3e-02, 4e+05]
Presolve removed 3503 rows and 13994 columns
Presolve time: 0.01s
Presolved: 727 rows, 1611 columns, 3578 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    1.4154638e+02   3.518572e+04   0.000000e+00      0s
     153    9.0890326e+04   0.000000e+00   0.000000e+00      0s

Solved in 153 iterations and 0.02 seconds (0.01 work units)
Optimal objective  9.089032567e+04

User-callback calls 254, time in user-callback 0.00 sec

julia> ret = [con for (con, expr) in p if value(expr) > 0]
112-element Vector{ConstraintRef{Model, C, ScalarShape} where C}:
 rest_1r[99] : -er_affluent[99] + hn_affluent[99] + _[8962] - _[8963] = 0.41990000000000005
 rest_1r[63] : -er_affluent[63] + hn_affluent[63] + _[8890] - _[8891] = 51.792699999999996
 rest_1r[90] : -er_affluent[90] + hn_affluent[90] + _[8944] - _[8945] = 1.66
 rest_1r[87] : -er_affluent[87] + hn_affluent[87] + _[8938] - _[8939] = 0.5266
 rest_1r[120] : -er_affluent[120] + hn_affluent[120] + _[9004] - _[9005] = 1.5323
 rest_1r[64] : -er_affluent[64] + hn_affluent[64] + _[8892] - _[8893] = 0.7421
 rest_1r[80] : -er_affluent[80] + hn_affluent[80] + _[8924] - _[8925] = 0.5698
 rest_1r[107] : -er_affluent[107] + hn_affluent[107] + _[8978] - _[8979] = 1.2979999999999998
 rest_1r[65] : -er_affluent[65] + hn_affluent[65] + _[8894] - _[8895] = 0.2876
 rest_1r[143] : -er_affluent[143] + hn_affluent[143] + _[9050] - _[9051] = 0.2269
 rest_1r[156] : -er_affluent[156] + hn_affluent[156] + _[9076] - _[9077] = 1.4122999999999999
 rest_1r[138] : -er_affluent[138] + hn_affluent[138] + _[9040] - _[9041] = 1.1809999999999998
 rest_1r[53] : -er_affluent[53] + hn_affluent[53] + _[8874] - _[8875] = 0.9856
 rest_1r[88] : -er_affluent[88] + hn_affluent[88] + _[8940] - _[8941] = 2.0113
 rest_1r[103] : -er_affluent[103] + hn_affluent[103] + _[8970] - _[8971] = 0.8925
 rest_1r[137] : -er_affluent[137] + hn_affluent[137] + _[9038] - _[9039] = 0.1137
 rest_1r[127] : -er_affluent[127] + hn_affluent[127] + _[9018] - _[9019] = 0.3315
 rest_1r[60] : -er_affluent[60] + hn_affluent[60] + _[8884] - _[8885] = 18.598100000000002
 rest_1r[79] : -er_affluent[79] + hn_affluent[79] + _[8922] - _[8923] = 9.5976
 rest_1r[68] : -er_affluent[68] + hn_affluent[68] + _[8900] - _[8901] = 4.9464999999999995
 rest_1r[101] : -er_affluent[101] + hn_affluent[101] + _[8966] - _[8967] = 42.7886
 rest_1r[159] : -er_affluent[159] + hn_affluent[159] + _[9082] - _[9083] = 2.0759
 rest_1r[113] : -er_affluent[113] + hn_affluent[113] + _[8990] - _[8991] = 5.6557
 c2033 : 10 variable_generation[381,3] + _[13235] - _[13236] ∈ [0, 386]
 rest_1r[144] : -er_affluent[144] + hn_affluent[144] + _[9052] - _[9053] = 0
 rest_1r[151] : -er_affluent[151] + hn_affluent[151] + _[9066] - _[9067] = 8.4305
 rest_1q[57] : -0.0309 xn_affluent[28]_in - 0.0887 xn_affluent[166]_in - 0.0921 xn_affluent[171]_in - 0.3085 xn_affluent[57]_in - er_affluent[57] + xn_affluent[57]_out + _[8838] - _[8839] = 0
 ⋮
 rest_1r[89] : -er_affluent[89] + hn_affluent[89] + _[8942] - _[8943] = 0.46230000000000004
 rest_1r[94] : -er_affluent[94] + hn_affluent[94] + _[8952] - _[8953] = 4.8873
 rest_1r[49] : -er_affluent[49] + hn_affluent[49] + _[8866] - _[8867] = 12.4809
 rest_1r[155] : -er_affluent[155] + hn_affluent[155] + _[9074] - _[9075] = 3.3948
 rest_1r[135] : -er_affluent[135] + hn_affluent[135] + _[9034] - _[9035] = 0.5721
 rest_1r[124] : -er_affluent[124] + hn_affluent[124] + _[9012] - _[9013] = 0.2136
 rest_1r[61] : -er_affluent[61] + hn_affluent[61] + _[8886] - _[8887] = 4.0908
 rest_1r[158] : -er_affluent[158] + hn_affluent[158] + _[9080] - _[9081] = 0.27499999999999997
 rest_1r[105] : -er_affluent[105] + hn_affluent[105] + _[8974] - _[8975] = 1.0598
 rest_1r[111] : -er_affluent[111] + hn_affluent[111] + _[8986] - _[8987] = 1.7789
 rest_1r[142] : -er_affluent[142] + hn_affluent[142] + _[9048] - _[9049] = 0.3367
 rest_1r[119] : -er_affluent[119] + hn_affluent[119] + _[9002] - _[9003] = 1.3852
 rest_1r[48] : -er_affluent[48] + hn_affluent[48] + _[8864] - _[8865] = 17.0807
 rest_1r[62] : -er_affluent[62] + hn_affluent[62] + _[8888] - _[8889] = 10.293099999999999
 rest_1r[109] : -er_affluent[109] + hn_affluent[109] + _[8982] - _[8983] = 200.72199999999998
 rest_1r[114] : -er_affluent[114] + hn_affluent[114] + _[8992] - _[8993] = 4.5107
 rest_1r[141] : -er_affluent[141] + hn_affluent[141] + _[9046] - _[9047] = 0.3512
 rest_1r[41] : -er_affluent[41] + hn_affluent[41] + _[8850] - _[8851] = 17.1828
 rest_1r[85] : -er_affluent[85] + hn_affluent[85] + _[8934] - _[8935] = 6.0096
 rest_1r[132] : -er_affluent[132] + hn_affluent[132] + _[9028] - _[9029] = 0.5699000000000001
 rest_1r[52] : -er_affluent[52] + hn_affluent[52] + _[8872] - _[8873] = 2.4017999999999997
 rest_1r[40] : -er_affluent[40] + hn_affluent[40] + _[8848] - _[8849] = 13.2286
 rest_1r[92] : -er_affluent[92] + hn_affluent[92] + _[8948] - _[8949] = 5.696999999999999
 rest_1r[121] : -er_affluent[121] + hn_affluent[121] + _[9006] - _[9007] = 0.8735999999999999
 rest_1r[77] : -er_affluent[77] + hn_affluent[77] + _[8918] - _[8919] = 1.9346
 rest_1r[71] : -er_affluent[71] + hn_affluent[71] + _[8906] - _[8907] = 0.0688

Tomás Riquelme · Answer 13 · Tue Jun 27 2023 21:59:10 GMT+0800 (China Standard Time)

That constraint corresponds to how some affluents are modeled, hn_affluent is the affluent considered in the water balance equations, which is divided by trend, seasonality and an error (er_affluent). As I 'm looking to the print what you posted, int the constraint "rest_1r", I don't know to which variables are assigned the two other variables (the ones without a name, just "_"). er_affluent is the uncertainty, which is fixed in such a way that the hn_affluent wil never be negative.

Oscar Dowson · Answer 14 · Wed Jun 28 2023 06:24:15 GMT+0800 (China Standard Time)

which is fixed in such a way that the hn_affluent wil never be negative.

You might want to check how you implemented this.

It's hard to know without the source code and just from this one subproblem, but your model must always be feasible.

I'd try commenting out some constraints and re-running until you find what set of constraints make the problem infeasible.

Note that something like this is not okay, because the inflow constraint can conflict with the inflow >= 0 constraint:

@variable(sp, inflow >= 0, SDDP.State, initial_value = 0)
@variable(sp, noise)
@constraint(sp, inflow.out == inflow.in + noise)
SDDP.parameterize(sp, [-1, 1]) do w
    fix(noise, w)
end

Oscar Dowson · Answer 15 · Sun Jul 09 2023 23:58:06 GMT+0800 (China Standard Time)

Any update on this?

Oscar Dowson · Answer 16 · Fri Jul 21 2023 05:28:38 GMT+0800 (China Standard Time)

Closing as stale. Please re-open with as much detail as you can provide if you've updated to the latest release of SDDP.jl and double checked that your model has a feasible solution for all possible incoming values of the state variables.