odow / SDDP.jl

Stochastic Dual Dynamic Programming in Julia

Home Page:https://sddp.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparing DH vs HD problem

hghayoom opened this issue · comments

Hi Oscar,
I am working with your amazing tool and I want to see some examples for comparing DH and HD problems.
I found this Inventory Problem which compares these two approaches. But it is a little different from what I expected. I need some examples without a "discount factor=0.95".
I know that in your dissertation for MOO, you have compared result of DH and HD, But I cannot see how you calculated DH results
image
I think New version of this model is Here
I'm wondering if there are other examples for DH? because a lot of times we are not aware of the uncertainties that might happen for us.

Thanks

The MOO model is actually: https://github.com/odow/MOO. That model doesn't use SDDP. The decision-hazard was a two-stage stochastic program, in which we first compute the stocking rate, and then observe the price. The hazard-decision solves a different instance of the model for each price.

There is no tooling in SDDP.jl for explicitly modeling decision-hazard problems. Instead, you need to model the problem in which the decision is a state variable that is decided in the previous stage.

So in the HD example, the order_quantity is a control variable:

and in the DH example, the order_quantity is a state variable:

@variable(subproblem, 0 <= order_quantity, SDDP.State, initial_value = 0)

and we use the incoming value in the subproblem:
state.in + order_quantity.in - demand + lost_demand - disposed_units

Take a look at

https://ieeexplore.ieee.org/abstract/document/7859406

https://www.researchgate.net/profile/Alexandre-Street/publication/328228810_On_the_Solution_of_Decision-Hazard_Multistage_Stochastic_Hydrothermal_Scheduling_Problems/links/5bbfc748a6fdcc2c91f6b0e5/On-the-Solution-of-Decision-Hazard-Multistage-Stochastic-Hydrothermal-Scheduling-Problems.pdf

Thanks for your response. It was very insightful.
I have some other questions about this inventory management example.
I am using Visual studio code and I run the model in debug mode. then I can see the final variables in the variable window

  1. After running this example I see 500 results( because we simulate it 500 times) but within each simulation, there are different numbers of stages( for example one of them is only one stage and the other one contains 3 stages.). Are there any convergence criteria for that leading to different stages?( see the picture)
    image

  2. In DH problem, I cannot save the outputs in results of simulation :
    results = SDDP.simulate(model, 500,[:state, :order_quantity,:lost_demand,:disposed_units,:backordered_units,:held_units,:demand,],)
    therefor, after I moved the definition of these variables to the beginning, I could save them.
    I moved these :

    @variables(subproblem, begin

    to Here:
    @variable(subproblem, 0 <= order_quantity, SDDP.State, initial_value = 0)

In DH problem, for odd stages (e.g 1,3,5,...) demand and other parameters are zero. I thought the reason is due to defining a decision node at the beginning of DH policy graph here:


and we are somehow expanding stages and each two consequential nodes are representative of a time step.
Results of even nodes ( like 2):
image

Results of odd nodes ( like 3):
image

  1. I will try to convert one of your basic examples from HD to DH and see if it is correct.

Are there any convergence criteria for that leading to different stages?( see the picture)

If your graph is cyclic, then there will be different numbers of stages. This ensures that the sample mean of the cost for each scenario is an unbiased estimator for the objective value of the problem.

You can simulate to a fixed depth with:

SDDP.simulate(
    model,
    500;
    sampling_scheme = SDDP.InSampleMonteCarlo(;
        max_depth::Int = 5,
        terminate_on_dummy_leaf= false,
    ),
)

In DH problem, I cannot save the outputs in results of simulation

Try

SDDP.simulate(model, 500; skip_undefined_variables = true)

In DH problem, for odd stages (e.g 1,3,5,...) demand and other parameters are zero. I thought the reason is due to defining a decision node at the beginning of DH policy graph here:
and we are somehow expanding stages and each two consequential nodes are representative of a time step.

This follows from the definition of our graph:

graph = SDDP.Graph(
:root_node,
[:decision, :recourse],
[
(:root_node => :decision, 1.0),
(:decision => :recourse, 1.0),
(:recourse => :decision, 0.95),
],
)

Take a look at the :node_index fields of each simulation. They'll go decision, recourse, decision, ...

You don't actually need the two nodes though. You could do this (untested so maybe typos, but you get the idea):

function infinite_lin_DH()
    graph = SDDP.LinearGraph(1)
    SDDP.add_edge(graph, 1 => 1, 0.95)
    return SDDP.PolicyGraph(
        graph;
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do subproblem, node
        @variable(subproblem, -10 <= state <= 10, SDDP.State, initial_value = 0)
        @variable(subproblem, 0 <= order_quantity, SDDP.State, initial_value = 0)
        @variables(subproblem, begin
            lost_demand >= 0
            disposed_units >= 0
            backordered_units >= 0
            demand
        end)
        @constraints(subproblem, begin
            state.out == state.in + order_quantity.in - demand + lost_demand - disposed_units
            backordered_units >= -state.out
        end)
        Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 50)
        sort!(Pg)
        SDDP.parameterize-> JuMP.fix(demand, ω), subproblem, Pg)
        @stageobjective(
            subproblem,
            20 * order_quantity.out +   # Order cost
            2 * state.out +             # Holding cost ch.
            10 * backordered_units +    # Backorder cost cb.
            10 * disposed_units +       # Disposal cost cd.
            100 * lost_demand           # Lost demand cost cl.
        )
    end
end

DH vs HD is a modeling choice. It doesn't change anything about the algorithm or require special structures.

Thank you Oscar for the illustrations.
they help me a lot for shaping my codes.
I wrote a code that compares two problems of HD and DH in four steps. I need to learn them to use in my work. HD problem works properly. But for DH problem I get infeasibility issue that you described here. but I don't know why I should get this error. I have defined positive and negative slack variables (+Borrow and -s) so that the constraint should be feasible all times. But I don't know why this happens.

using SDDP, HiGHS, Test, GLPK,Distributions,Random,Statistics


function finite_lin_DH()

    graph2 = SDDP.Graph(
        :root_node,
        # we have 4 time steps and for each one we need a decicion node and a resource node
        [:decision1, :recourse1,:decision2, :recourse2,:decision3, :recourse3,:decision4, :recourse4],
        [
            (:root_node => :decision1, 1.0),
            (:decision1 => :recourse1, 1.0),
            (:recourse1 => :decision2, 1.0),
            (:decision2 => :recourse2, 1.0),
            (:recourse2 => :decision3, 1.0),
            (:decision3 => :recourse3, 1.0),
            (:recourse3 => :decision4, 1.0),
            (:decision4 => :recourse4, 1.0),
            # (:recourse4 => :decision1, 0.95),
        ],
    )
dd=1
model = SDDP.PolicyGraph(
    graph2,
        # bellman_function = SDDP.BellmanFunction(lower_bound = 0),
        # optimizer = GLPK.Optimizer,
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do sp, node
        @variable(sp, 0 <= x <= 8, SDDP.State, initial_value = 7)# reservoir water level in each step
        @variable(sp, 0 <= p , SDDP.State, initial_value = 2) # producing power with cost

        @variables(sp, begin
            y >= 0 # using reservoir water
           # p >= 0 #Power Generation
            s>=0   # spill water
            ξ  # input
            Borow>=0#Borrow water if water level is negative
        end)

        if  node in [:decision1 , :decision2 ,:decision3 ,:decision4 ]    
            @constraint(sp, x.out == x.in)
            @stageobjective(sp, 20 * p.out)

        else 
            @constraints(sp, begin
            p.in + y == 9
            x.out == x.in - y + ξ - s+ Borow
            end)
            Pg=rand([2, 3],4)
            # Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 4)
            # sort!(Pg)
            SDDP.parameterize(sp, Pg) do ω
                JuMP.fix(ξ, ω)
            end
            @stageobjective(sp, 200 * s+500*Borow)

        end
    end
    return model
end



function finite_lin_HD()

dd=1
model = SDDP.PolicyGraph(
    # this time we define a line graph
        SDDP.LinearGraph(4),
        sense = :Min,
        lower_bound = 0.0,
        optimizer = HiGHS.Optimizer,
    ) do sp, node
        @variable(sp, 0 <= x <= 8, SDDP.State, initial_value = 7)# reservoir water level in each step
       # @variable(sp, 0 <= p , SDDP.State, initial_value = 2) # producing power with cost

        @variables(sp, begin
            y >= 0 # using reservoir water
            p >= 0 #Power Generation
            s>=0   # spill water
            ξ  # input
            Borow>=0#Borrow water if water level is negative
        end)

        # if  node in [:decision1 , :decision2 ,:decision3 ,:decision4 ]    
        #     @constraint(sp, x.out == x.in)
        #     @stageobjective(sp, 20 * p.out)

        # else 
            @constraints(sp, begin
            p + y == 9
            x.out == x.in - y + ξ - s+ Borow
            end)
            Pg=rand([2, 3],4)
            # Pg = rand(Distributions.TruncatedNormal(5, 2, 0, 10), 4)
            # sort!(Pg)
            SDDP.parameterize(sp, Pg) do ω
                JuMP.fix(ξ, ω)
            end
            @stageobjective(sp, 200 * s+500*Borow+20 * p)

        # end
    end
    return model
end




a=1
        
Random.seed!(1234)
begin
    model = infinite_lin_DH()
    SDDP.train(model, iteration_limit = 30, print_level = 1)
    results = SDDP.simulate(model, 100, [:y,:p,,:x,:s,:Borow],)
    objectives = [sum(s[:stage_objective] for s in simulation) for simulation in results]
    sample_mean = round(Statistics.mean(objectives); digits = 2)
    sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(500); digits = 2)
    println("DH Confidence_interval = $(sample_mean) ± $(sample_ci)")
end

Random.seed!(1234)
begin
    model = infinite_lin_HD()
    SDDP.train(model, iteration_limit = 30, print_level = 1)
    results = SDDP.simulate(model, 100, [:y,:p,,:x,:s,:Borow],)
    objectives = [sum(s[:stage_objective] for s in simulation) for simulation in results]
    sample_mean = round(Statistics.mean(objectives); digits = 2)
    sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(500); digits = 2)
    println("HD Confidence_interval = $(sample_mean) ± $(sample_ci)")
end

Here's how I would write your code.

Some tips:

  • You don't need explicit "decision" "hazard" nodes. The DH vs HD 'trick' is just a modeling choice, where a variable is either a state or a control.
  • Your DH example is infeasible because it doesn't account for the case where you decide a large p that is greater than 9. You have two options: add constraints so that p must be less than the demand in the next stage, or allow total supply to exceed demand. (Or add a new demand source that only operates when the price is 0.)
julia> using SDDP, HiGHS, Random, Statistics

julia> function finite_lin_DH()
           return SDDP.LinearPolicyGraph(;
               stages = 4,
               sense = :Min,
               lower_bound = 0.0,
               optimizer = HiGHS.Optimizer,
           ) do sp, node
               @variables(sp, begin
                   0 <= x <= 8, SDDP.State, (initial_value = 7)
                   p >= 0, SDDP.State, (initial_value = 2)
                   y >= 0
                   s >= 0
                   ξ
                   borrow >= 0
               end)
               @constraints(sp, begin
                   # This constraint is wrong, because p.in could be > 9 already.
                   # p.in + y == 9
                   p.in + y >= 9
                   x.out == x.in - y + ξ - s + borrow
               end)
               SDDP.parameterize(sp, rand([2, 3], 4)) do ω
                   JuMP.fix(ξ, ω)
               end
               @stageobjective(sp, 200s + 500borrow + 20p.out)
           end
           return model
       end
finite_lin_DH (generic function with 1 method)

julia> function finite_lin_HD()
           model = SDDP.LinearPolicyGraph(
               stages = 4,
               sense = :Min,
               lower_bound = 0.0,
               optimizer = HiGHS.Optimizer,
           ) do sp, node
               @variables(sp, begin
                   0 <= x <= 8, (SDDP.State, initial_value = 7)
                   p >= 0
                   y >= 0
                   s >= 0
                   ξ
                   borrow >= 0
               end)
               @constraints(sp, begin
                   p + y == 9
                   x.out == x.in - y + ξ - s + borrow
               end)
               SDDP.parameterize(sp, rand([2, 3],4)) do ω
                   JuMP.fix(ξ, ω)
               end
               @stageobjective(sp, 200s + 500borrow + 20p)
           end
           return model
       end
finite_lin_HD (generic function with 1 method)

julia> function train_and_compute_ci(model; replications = 100)
           SDDP.train(model; iteration_limit = 30, print_level = 0)
           results = SDDP.simulate(model, replications)
           objectives = [
               sum(s[:stage_objective] for s in simulation) for simulation in results
           ]
           sample_mean = round(Statistics.mean(objectives); digits = 2)
           sample_ci = round(1.96 * Statistics.std(objectives) / sqrt(replications); digits = 2)
           println("Confidence_interval = $(sample_mean) ± $(sample_ci)")
           return
       end
train_and_compute_ci (generic function with 1 method)

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> begin
           model = finite_lin_DH()
           train_and_compute_ci(model)
       end
Confidence_interval = 356.0 ± 2.67

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> begin
           model = finite_lin_HD()
           train_and_compute_ci(model)
       end
Confidence_interval = 379.6 ± 3.15

This shows that being able to adapt your decision to the realization is worth ~$24.

Thanks, Oscar.
That is wonderful. Understanding both DH and HD is very important because we can calculate EVPI.
I suggest you add this example to your example. I learned a loooot from this.

Thanks again