Example: Two-stage Newsvendor
This example is based on the classical newsvendor problem.
using SDDP
import HiGHS
import PlotsFirst, we need some discretized distribution of demand. For simplicity, we're going to sample 10 points from the uniform [0, 1) distribution three times and add them together. This is a rough approximation of a normal distribution.
Ω = rand(10) .+ rand(10) .+ rand(10)10-element Vector{Float64}:
2.514457452858477
1.3144615540636075
0.41576472896023486
2.599911049619416
1.2804348203882574
2.367187506280424
1.7879744488109808
2.2164559890889777
2.210699795767045
2.108398945150128We also need some price data. We assume the agent can buy a newspaper for $1 and sell it for $1.20.
buy_price, sales_price = 1.0, 1.2(1.0, 1.2)Now we can formulate and train a policy for the two-stage newsvendor problem:
model = SDDP.LinearPolicyGraph(
stages = 2,
sense = :Max,
upper_bound = maximum(Ω) * sales_price,
optimizer = HiGHS.Optimizer,
) do subproblem, stage
@variable(subproblem, inventory >= 0, SDDP.State, initial_value = 0)
if stage == 1
@variable(subproblem, buy >= 0)
@constraint(subproblem, inventory.out == inventory.in + buy)
@stageobjective(subproblem, -buy_price * buy)
else
@variable(subproblem, sell >= 0)
@constraint(subproblem, sell <= inventory.in)
SDDP.parameterize(subproblem, Ω) do ω
return JuMP.set_upper_bound(sell, ω)
end
@stageobjective(subproblem, sales_price * sell)
end
end
SDDP.train(model; stopping_rules = [SDDP.BoundStalling(5, 1e-6)])------------------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and SDDP.jl contributors, 2017-23
Problem
Nodes : 2
State variables : 1
Scenarios : 1.00000e+01
Existing cuts : false
Subproblem structure : (min, max)
Variables : (4, 4)
VariableRef in MOI.LessThan{Float64} : (1, 1)
AffExpr in MOI.LessThan{Float64} : (1, 1)
VariableRef in MOI.GreaterThan{Float64} : (2, 3)
AffExpr in MOI.EqualTo{Float64} : (1, 1)
Options
Solver : serial mode
Risk measure : SDDP.Expectation()
Sampling scheme : SDDP.InSampleMonteCarlo
Numerical stability report
Non-zero Matrix range [1e+00, 1e+00]
Non-zero Objective range [1e+00, 1e+00]
Non-zero Bounds range [4e-01, 3e+00]
Non-zero RHS range [0e+00, 0e+00]
No problems detected
Iteration Simulation Bound Time (s) Proc. ID # Solves
1 0.000000e+00 5.199822e-01 2.240896e-03 1 13
2 2.407140e-01 3.763149e-01 3.149033e-03 1 26
3 3.763149e-01 2.399318e-01 4.354954e-03 1 39
4 2.399318e-01 1.667683e-01 5.290031e-03 1 52
5 2.921914e-01 1.536876e-01 6.317854e-03 1 65
6 2.594896e-01 1.523266e-01 7.360935e-03 1 78
7 2.560870e-01 1.523266e-01 8.394003e-03 1 91
8 2.560870e-01 1.523266e-01 9.412050e-03 1 104
9 -7.815171e-01 1.523266e-01 1.057887e-02 1 117
10 2.560870e-01 1.523266e-01 1.161885e-02 1 130
11 2.560870e-01 1.523266e-01 1.268983e-02 1 143
Terminating training
Status : bound_stalling
Total time (s) : 1.268983e-02
Total solves : 143
Best bound : 1.523266e-01
Simulation CI : 1.501339e-01 ± 1.901194e-01
------------------------------------------------------------------------------To check the first-stage buy decision, we need to obtain a decision rule for the first-stage node 1:
first_stage_rule = SDDP.DecisionRule(model, node = 1)A decision rule for node 1Then we can evaluate it, passing in a starting point for the incoming state:
solution = SDDP.evaluate(
first_stage_rule;
incoming_state = Dict(:inventory => 0.0),
controls_to_record = [:buy],
)(stage_objective = -1.2804348203882605, outgoing_state = Dict(:inventory => 1.2804348203882605), controls = Dict(:buy => 1.2804348203882605))The optimal value of the buy variable is stored here:
solution.controls[:buy]1.2804348203882605Introducing risk
The solution of a single newsvendor problem offers little insight about how a decision-maker should act. In particular, they may be averse to bad outcomes, such as when they purchase a larger number of newspapers only for there to be little demand.
We can explore how the optimal decision changes with risk by creating a function:
function solve_risk_averse_newsvendor(Ω, risk_measure)
model = SDDP.LinearPolicyGraph(
stages = 2,
sense = :Max,
upper_bound = maximum(Ω) * sales_price,
optimizer = HiGHS.Optimizer,
) do subproblem, stage
@variable(subproblem, inventory >= 0, SDDP.State, initial_value = 0)
if stage == 1
@variable(subproblem, buy >= 0)
@constraint(subproblem, inventory.out == inventory.in + buy)
@stageobjective(subproblem, -buy_price * buy)
else
@variable(subproblem, sell >= 0)
@constraint(subproblem, sell <= inventory.in)
SDDP.parameterize(subproblem, Ω) do ω
return JuMP.set_upper_bound(sell, ω)
end
@stageobjective(subproblem, sales_price * sell)
end
end
SDDP.train(
model;
risk_measure = risk_measure,
stopping_rules = [SDDP.BoundStalling(5, 1e-6)],
print_level = 0,
)
first_stage_rule = SDDP.DecisionRule(model, node = 1)
solution = SDDP.evaluate(
first_stage_rule;
incoming_state = Dict(:inventory => 0.0),
controls_to_record = [:buy],
)
return solution.controls[:buy]
endsolve_risk_averse_newsvendor (generic function with 1 method)Now we can see how many units a risk-neutral decision maker would order:
solve_risk_averse_newsvendor(Ω, SDDP.Expectation())1.2804348203882605as well as a decision-maker who cares only about the worst-case outcome:
solve_risk_averse_newsvendor(Ω, SDDP.WorstCase())0.41576472896023486In general, the decision-maker will be somewhere between the two extremes. The SDDP.Entropic risk measure is a risk measure that has a single parameter that lets us explore the space of policies between the two extremes. When the parameter is small, the measure acts like SDDP.Expectation, and when it is large, it acts like SDDP.WorstCase.
Here is what we get if we solve our problem multiple times for different values of the risk aversion parameter $\gamma$:
Γ = [10^i for i in -3:0.2:3]
buy = [solve_risk_averse_newsvendor(Ω, SDDP.Entropic(γ)) for γ in Γ]
Plots.plot(
Γ,
buy;
xaxis = :log,
xlabel = "Risk aversion parameter γ",
ylabel = "First-stage buy decision",
legend = false,
)
Things to try
There are a number of things you can try next:
- Experiment with different buy and sales prices
- Experiment with different distributions of demand
- Explore how the optimal policy changes if you use a different risk measure
- What happens if you can only buy and sell integer numbers of newspapers? Try this by adding
Intto the variable definitions:@variable(subproblem, buy >= 0, Int)
This page was generated using Literate.jl.