Example: Two-stage Newsvendor

This example is based on the classical newsvendor problem.

using SDDP
import HiGHS
import Plots

First, we need some discretized distribution of demand. For simplicity, we're going to sample 10 points from the uniform [0, 1) distribution three times and add them together. This is a rough approximation of a normal distribution.

Ω = rand(10) .+ rand(10) .+ rand(10)
10-element Vector{Float64}:
 2.514457452858477
 1.3144615540636075
 0.41576472896023486
 2.599911049619416
 1.2804348203882574
 2.367187506280424
 1.7879744488109808
 2.2164559890889777
 2.210699795767045
 2.108398945150128

We also need some price data. We assume the agent can buy a newspaper for $1 and sell it for $1.20.

buy_price, sales_price = 1.0, 1.2
(1.0, 1.2)

Now we can formulate and train a policy for the two-stage newsvendor problem:

model = SDDP.LinearPolicyGraph(
    stages = 2,
    sense = :Max,
    upper_bound = maximum(Ω) * sales_price,
    optimizer = HiGHS.Optimizer,
) do subproblem, stage
    @variable(subproblem, inventory >= 0, SDDP.State, initial_value = 0)
    if stage == 1
        @variable(subproblem, buy >= 0)
        @constraint(subproblem, inventory.out == inventory.in + buy)
        @stageobjective(subproblem, -buy_price * buy)
    else
        @variable(subproblem, sell >= 0)
        @constraint(subproblem, sell <= inventory.in)
        SDDP.parameterize(subproblem, Ω) do ω
            return JuMP.set_upper_bound(sell, ω)
        end
        @stageobjective(subproblem, sales_price * sell)
    end
end

SDDP.train(model; stopping_rules = [SDDP.BoundStalling(5, 1e-6)])
------------------------------------------------------------------------------
          SDDP.jl (c) Oscar Dowson and SDDP.jl contributors, 2017-23

Problem
  Nodes           : 2
  State variables : 1
  Scenarios       : 1.00000e+01
  Existing cuts   : false
  Subproblem structure                      : (min, max)
    Variables                               : (4, 4)
    VariableRef in MOI.LessThan{Float64}    : (1, 1)
    AffExpr in MOI.LessThan{Float64}        : (1, 1)
    VariableRef in MOI.GreaterThan{Float64} : (2, 3)
    AffExpr in MOI.EqualTo{Float64}         : (1, 1)
Options
  Solver          : serial mode
  Risk measure    : SDDP.Expectation()
  Sampling scheme : SDDP.InSampleMonteCarlo

Numerical stability report
  Non-zero Matrix range     [1e+00, 1e+00]
  Non-zero Objective range  [1e+00, 1e+00]
  Non-zero Bounds range     [4e-01, 3e+00]
  Non-zero RHS range        [0e+00, 0e+00]
No problems detected

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    0.000000e+00   5.199822e-01   2.240896e-03          1         13
        2    2.407140e-01   3.763149e-01   3.149033e-03          1         26
        3    3.763149e-01   2.399318e-01   4.354954e-03          1         39
        4    2.399318e-01   1.667683e-01   5.290031e-03          1         52
        5    2.921914e-01   1.536876e-01   6.317854e-03          1         65
        6    2.594896e-01   1.523266e-01   7.360935e-03          1         78
        7    2.560870e-01   1.523266e-01   8.394003e-03          1         91
        8    2.560870e-01   1.523266e-01   9.412050e-03          1        104
        9   -7.815171e-01   1.523266e-01   1.057887e-02          1        117
       10    2.560870e-01   1.523266e-01   1.161885e-02          1        130
       11    2.560870e-01   1.523266e-01   1.268983e-02          1        143

Terminating training
  Status         : bound_stalling
  Total time (s) : 1.268983e-02
  Total solves   : 143
  Best bound     :  1.523266e-01
  Simulation CI  :  1.501339e-01 ± 1.901194e-01
------------------------------------------------------------------------------

To check the first-stage buy decision, we need to obtain a decision rule for the first-stage node 1:

first_stage_rule = SDDP.DecisionRule(model, node = 1)
A decision rule for node 1

Then we can evaluate it, passing in a starting point for the incoming state:

solution = SDDP.evaluate(
    first_stage_rule;
    incoming_state = Dict(:inventory => 0.0),
    controls_to_record = [:buy],
)
(stage_objective = -1.2804348203882605, outgoing_state = Dict(:inventory => 1.2804348203882605), controls = Dict(:buy => 1.2804348203882605))

The optimal value of the buy variable is stored here:

solution.controls[:buy]
1.2804348203882605

Introducing risk

The solution of a single newsvendor problem offers little insight about how a decision-maker should act. In particular, they may be averse to bad outcomes, such as when they purchase a larger number of newspapers only for there to be little demand.

We can explore how the optimal decision changes with risk by creating a function:

function solve_risk_averse_newsvendor(Ω, risk_measure)
    model = SDDP.LinearPolicyGraph(
        stages = 2,
        sense = :Max,
        upper_bound = maximum(Ω) * sales_price,
        optimizer = HiGHS.Optimizer,
    ) do subproblem, stage
        @variable(subproblem, inventory >= 0, SDDP.State, initial_value = 0)
        if stage == 1
            @variable(subproblem, buy >= 0)
            @constraint(subproblem, inventory.out == inventory.in + buy)
            @stageobjective(subproblem, -buy_price * buy)
        else
            @variable(subproblem, sell >= 0)
            @constraint(subproblem, sell <= inventory.in)
            SDDP.parameterize(subproblem, Ω) do ω
                return JuMP.set_upper_bound(sell, ω)
            end
            @stageobjective(subproblem, sales_price * sell)
        end
    end
    SDDP.train(
        model;
        risk_measure = risk_measure,
        stopping_rules = [SDDP.BoundStalling(5, 1e-6)],
        print_level = 0,
    )
    first_stage_rule = SDDP.DecisionRule(model, node = 1)
    solution = SDDP.evaluate(
        first_stage_rule;
        incoming_state = Dict(:inventory => 0.0),
        controls_to_record = [:buy],
    )
    return solution.controls[:buy]
end
solve_risk_averse_newsvendor (generic function with 1 method)

Now we can see how many units a risk-neutral decision maker would order:

solve_risk_averse_newsvendor(Ω, SDDP.Expectation())
1.2804348203882605

as well as a decision-maker who cares only about the worst-case outcome:

solve_risk_averse_newsvendor(Ω, SDDP.WorstCase())
0.41576472896023486

In general, the decision-maker will be somewhere between the two extremes. The SDDP.Entropic risk measure is a risk measure that has a single parameter that lets us explore the space of policies between the two extremes. When the parameter is small, the measure acts like SDDP.Expectation, and when it is large, it acts like SDDP.WorstCase.

Here is what we get if we solve our problem multiple times for different values of the risk aversion parameter $\gamma$:

Γ = [10^i for i in -3:0.2:3]
buy = [solve_risk_averse_newsvendor(Ω, SDDP.Entropic(γ)) for γ in Γ]
Plots.plot(
    Γ,
    buy;
    xaxis = :log,
    xlabel = "Risk aversion parameter γ",
    ylabel = "First-stage buy decision",
    legend = false,
)

Things to try

There are a number of things you can try next:

  • Experiment with different buy and sales prices
  • Experiment with different distributions of demand
  • Explore how the optimal policy changes if you use a different risk measure
  • What happens if you can only buy and sell integer numbers of newspapers? Try this by adding Int to the variable definitions: @variable(subproblem, buy >= 0, Int)

This page was generated using Literate.jl.