Example: two-stage newsvendor
This tutorial was generated using Literate.jl. Download the source as a .jl
file. Download the source as a .ipynb
file.
The purpose of this tutorial is to demonstrate how to model and solve a two-stage stochastic program.
It is based on the Two stage stochastic programs tutorial in JuMP.
This tutorial uses the following packages
using JuMP
using SDDP
import Distributions
import ForwardDiff
import HiGHS
import Plots
import StatsPlots
import Statistics
Background
The data for this problem is:
D = Distributions.TriangularDist(150.0, 250.0, 200.0)
N = 100
d = sort!(rand(D, N));
Ω = 1:N
P = fill(1 / N, N);
StatsPlots.histogram(d; bins = 20, label = "", xlabel = "Demand")
Kelley's cutting plane algorithm
Kelley's cutting plane algorithm is an iterative method for maximizing concave functions. Given a concave function $f(x)$, Kelley's constructs an outer-approximation of the function at the minimum by a set of first-order Taylor series approximations (called cuts) constructed at a set of points $k = 1,\ldots,K$:
\[\begin{aligned} f^K = \max\limits_{\theta \in \mathbb{R}, x \in \mathbb{R}^N} \;\; & \theta\\ & \theta \le f(x_k) + \nabla f(x_k)^\top (x - x_k),\quad k=1,\ldots,K\\ & \theta \le M, \end{aligned}\]
where $M$ is a sufficiently large number that is an upper bound for $f$ over the domain of $x$.
Kelley's cutting plane algorithm is a structured way of choosing points $x_k$ to visit, so that as more cuts are added:
\[\lim_{K \rightarrow \infty} f^K = \max\limits_{x \in \mathbb{R}^N} f(x)\]
However, before we introduce the algorithm, we need to introduce some bounds.
Bounds
By convexity, $f(x) \le f^K$ for all $x$. Thus, if $x^*$ is a maximizer of $f$, then at any point in time we can construct an upper bound for $f(x^*)$ by solving $f^K$.
Moreover, we can use the primal solutions $x_k^*$ returned by solving $f^k$ to evaluate $f(x_k^*)$ to generate a lower bound.
Therefore, $\max\limits_{k=1,\ldots,K} f(x_k^*) \le f(x^*) \le f^K$.
When the lower bound is sufficiently close to the upper bound, we can terminate the algorithm and declare that we have found an solution that is close to optimal.
Implementation
Here is pseudo-code fo the Kelley algorithm:
- Take as input a convex function $f(x)$ and a iteration limit $K_{max}$. Set $K = 1$, and initialize $f^{K-1}$. Set $lb = -\infty$ and $ub = \infty$.
- Solve $f^{K-1}$ to obtain a candidate solution $x_{K}$.
- Update $ub = f^{K-1}$ and $lb = \max\{lb, f(x_{K})\}$.
- Add a cut $\theta \ge f(x_{K}) + \nabla f\left(x_{K}\right)^\top (x - x_{K})$ to form $f^{K}$.
- Increment $K$.
- If $K > K_{max}$ or $|ub - lb| < \epsilon$, STOP, otherwise, go to step 2.
And here's a complete implementation:
function kelleys_cutting_plane(
# The function to be minimized.
f::Function,
# The gradient of `f`. By default, we use automatic differentiation to
# compute the gradient of f so the user doesn't have to!
∇f::Function = x -> ForwardDiff.gradient(f, x);
# The number of arguments to `f`.
input_dimension::Int,
# An upper bound for the function `f` over its domain.
upper_bound::Float64,
# The number of iterations to run Kelley's algorithm for before stopping.
iteration_limit::Int,
# The absolute tolerance ϵ to use for convergence.
tolerance::Float64 = 1e-6,
)
# Step (1):
K = 1
model = JuMP.Model(HiGHS.Optimizer)
JuMP.set_silent(model)
JuMP.@variable(model, θ <= upper_bound)
JuMP.@variable(model, x[1:input_dimension])
JuMP.@objective(model, Max, θ)
x_k = fill(NaN, input_dimension)
lower_bound, upper_bound = -Inf, Inf
while true
# Step (2):
JuMP.optimize!(model)
x_k .= JuMP.value.(x)
# Step (3):
upper_bound = JuMP.objective_value(model)
lower_bound = min(upper_bound, f(x_k))
println("K = $K : $(lower_bound) <= f(x*) <= $(upper_bound)")
# Step (4):
JuMP.@constraint(model, θ <= f(x_k) + ∇f(x_k)' * (x .- x_k))
# Step (5):
K = K + 1
# Step (6):
if K > iteration_limit
println("-- Termination status: iteration limit --")
break
elseif abs(upper_bound - lower_bound) < tolerance
println("-- Termination status: converged --")
break
end
end
println("Found solution: x_K = ", x_k)
return
end
kelleys_cutting_plane (generic function with 2 methods)
Let's run our algorithm to see what happens:
kelleys_cutting_plane(;
input_dimension = 2,
upper_bound = 10.0,
iteration_limit = 20,
) do x
return -(x[1] - 1)^2 + -(x[2] + 2)^2 + 1.0
end
K = 1 : -4.0 <= f(x*) <= 10.0
K = 2 : -2.25 <= f(x*) <= 10.0
K = 3 : -5.3125 <= f(x*) <= 10.0
K = 4 : 0.83984375 <= f(x*) <= 5.625
K = 5 : -1.3438585069444455 <= f(x*) <= 1.9791666666666667
K = 6 : 0.4532453748914933 <= f(x*) <= 1.7513020833333333
K = 7 : -2.794810401068801 <= f(x*) <= 1.3444010416666663
K = 8 : 0.19507712328139326 <= f(x*) <= 1.3179100884331594
K = 9 : 0.9073862122310157 <= f(x*) <= 1.3022015061077878
K = 10 : 0.7292616273896162 <= f(x*) <= 1.2835882279084943
K = 11 : 0.9856775767620292 <= f(x*) <= 1.1542808575464905
K = 12 : 0.9521967150117504 <= f(x*) <= 1.0538679846579115
K = 13 : 0.9907765147617908 <= f(x*) <= 1.0341945633777465
K = 14 : 0.990619313815891 <= f(x*) <= 1.0168012962055821
K = 15 : 0.9997569528573889 <= f(x*) <= 1.010937796651451
K = 16 : 0.9955736574995747 <= f(x*) <= 1.0023159378334365
K = 17 : 0.9981907645826057 <= f(x*) <= 1.001070011161672
K = 18 : 0.999293284088297 <= f(x*) <= 1.0010295293971427
K = 19 : 0.9997619192401398 <= f(x*) <= 1.0005033714074143
K = 20 : 0.9999234387181322 <= f(x*) <= 1.0003705497285347
-- Termination status: iteration limit --
Found solution: x_K = [1.0074056501552666, -1.9953397824465389]
L-Shaped theory
The L-Shaped method is a way of solving two-stage stochastic programs by Benders' decomposition. It takes the problem:
\[\begin{aligned} V = \max\limits_{x,y_\omega} \;\; & -2x + \mathbb{E}_\omega[5y_\omega - 0.1(x - y_\omega)] \\ & y_\omega \le x & \quad \forall \omega \in \Omega \\ & 0 \le y_\omega \le d_\omega & \quad \forall \omega \in \Omega \\ & x \ge 0. \end{aligned}\]
and decomposes it into a second-stage problem:
\[\begin{aligned} V_2(\bar{x}, d_\omega) = \max\limits_{x,x^\prime,y_\omega} \;\; & 5y_\omega - x^\prime \\ & y_\omega \le x \\ & x^\prime = x - y_\omega \\ & 0 \le y_\omega \le d_\omega \\ & x = \bar{x} & [\lambda] \end{aligned}\]
and a first-stage problem:
\[\begin{aligned} V = \max\limits_{x,\theta} \;\; & -2x + \theta \\ & \theta \le \mathbb{E}_\omega[V_2(x, \omega)] \\ & x \ge 0 \end{aligned}\]
Then, because $V_2$ is convex with respect to $\bar{x}$ for fixed $\omega$, we can use a set of feasible points $\{x^k\}$ construct an outer approximation:
\[\begin{aligned} V^K = \max\limits_{x,\theta} \;\; & -2x + \theta \\ & \theta \le \mathbb{E}_\omega[V_2(x^k, \omega) + \nabla V_2(x^k, \omega)^\top(x - x^k)] & \quad k = 1,\ldots,K\\ & x \ge 0 \\ & \theta \le M \end{aligned}\]
where $M$ is an upper bound on possible values of $V_2$ so that the problem has a bounded solution.
It is also useful to see that because $\bar{x}$ appears only on the right-hand side of a linear program, $\nabla V_2(x^k, \omega) = \lambda^k$.
Ignoring how we choose $x^k$ for now, we can construct a lower and upper bound on the optimal solution:
\[-2x^K + \mathbb{E}_\omega[V_2(x^K, \omega)] = \underbar{V} \le V \le \overline{V} = V^K\]
Thus, we need some way of cleverly choosing a sequence of $x^k$ so that the lower bound converges to the upper bound.
- Start with $K=1$
- Solve $V^{K-1}$ to get $x^K$
- Set $\overline{V} = V^k$
- Solve $V_2(x^K, \omega)$ for all $\omega$ and store the optimal objective value and dual solution $\lambda^K$
- Set $\underbar{V} = -2x^K + \mathbb{E}_\omega[V_2(x^k, \omega)]$
- If $\underbar{V} \approx \overline{V}$, STOP
- Add new constraint $\theta \le \mathbb{E}_\omega[V_2(x^K, \omega) +\lambda^K (x - x^K)]$
- Increment $K$, GOTO 2
The next section implements this algorithm in Julia.
L-Shaped implementation
Here's a function to compute the second-stage problem;
function solve_second_stage(x̅, d_ω)
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x_in)
@variable(model, x_out >= 0)
fix(x_in, x̅)
@variable(model, 0 <= u_sell <= d_ω)
@constraint(model, x_out == x_in - u_sell)
@constraint(model, u_sell <= x_in)
@objective(model, Max, 5 * u_sell - 0.1 * x_out)
optimize!(model)
return (
V = objective_value(model),
λ = reduced_cost(x_in),
x = value(x_out),
u = value(u_sell),
)
end
solve_second_stage(200, 170)
(V = 847.0, λ = -0.1, x = 30.0, u = 170.0)
Here's the first-stage subproblem:
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, x_in == 0)
@variable(model, x_out >= 0)
@variable(model, u_make >= 0)
@constraint(model, x_out == x_in + u_make)
M = 5 * maximum(d)
@variable(model, θ <= M)
@objective(model, Max, -2 * u_make + θ)
\[ -2 u\_make + θ \]
Importantly, to ensure we have a bounded solution, we need to add an upper bound to the variable θ
.
kIterationLimit = 100
for k in 1:kIterationLimit
println("Solving iteration k = $k")
# Step 2
optimize!(model)
xᵏ = value(x_out)
println(" xᵏ = $xᵏ")
# Step 3
ub = objective_value(model)
println(" V̅ = $ub")
# Step 4
ret = [solve_second_stage(xᵏ, d[ω]) for ω in Ω]
# Step 5
lb = value(-2 * u_make) + sum(p * r.V for (p, r) in zip(P, ret))
println(" V̲ = $lb")
# Step 6
if ub - lb < 1e-6
println("Terminating with near-optimal solution")
break
end
# Step 7
c = @constraint(
model,
θ <= sum(p * (r.V + r.λ * (x_out - xᵏ)) for (p, r) in zip(P, ret)),
)
println(" Added cut: $c")
end
Solving iteration k = 1
xᵏ = -0.0
V̅ = 1231.0300885731388
V̲ = 0.0
Added cut: -4.99999999999999 x_out + θ ≤ 0
Solving iteration k = 2
xᵏ = 246.20601771462825
V̅ = 738.6180531438822
V̲ = 494.54507250309786
Added cut: 0.10000000000000007 x_out + θ ≤ 1011.5777097038167
Solving iteration k = 3
xᵏ = 198.34857053016054
V̅ = 595.0457115904796
V̲ = 549.8085222671422
Added cut: -2.399 x_out + θ ≤ 470.66744262560695
Solving iteration k = 4
xᵏ = 216.45068710612637
V̅ = 557.0312667809515
V̲ = 542.3451137762154
Added cut: -1.0730000000000004 x_out + θ ≤ 742.994900723595
Solving iteration k = 5
xᵏ = 205.37515693664258
V̅ = 552.6121302433273
V̲ = 550.1186527903487
Added cut: -1.685000000000001 x_out + θ ≤ 614.8118272253898
Solving iteration k = 6
xᵏ = 201.88289159633482
V̅ = 551.2187163725446
V̲ = 550.6185789995207
Added cut: -1.9910000000000012 x_out + θ ≤ 552.4355250238865
Solving iteration k = 7
xᵏ = 200.41196666245082
V̅ = 550.6318173239249
V̲ = 550.4479471564604
Added cut: -2.1950000000000007 x_out + θ ≤ 511.3676136572838
Solving iteration k = 8
xᵏ = 201.31329101275801
V̅ = 550.623705404772
V̲ = 550.5871377802061
Added cut: -2.093000000000001 x_out + θ ≤ 531.8650017160219
Solving iteration k = 9
xᵏ = 201.671797135922
V̅ = 550.6204788496634
V̲ = 550.6170121478006
Added cut: -2.042000000000001 x_out + θ ≤ 542.1467966680893
Solving iteration k = 10
xᵏ = 201.73977168228572
V̅ = 550.6198670787461
V̲ = 550.619867078744
Terminating with near-optimal solution
To get the first-stage solution, we do:
optimize!(model)
xᵏ = value(x_out)
201.73977168228572
To compute a second-stage solution, we do:
solve_second_stage(xᵏ, 170.0)
(V = 846.8260228317714, λ = -0.1, x = 31.73977168228572, u = 170.0)
Policy Graph
Now let's see how we can formulate and train a policy for the two-stage newsvendor problem using SDDP.jl
. Under the hood, SDDP.jl
implements the exact algorithm that we just wrote by hand.
model = SDDP.LinearPolicyGraph(;
stages = 2,
sense = :Max,
upper_bound = 5 * maximum(d), # The `M` in θ <= M
optimizer = HiGHS.Optimizer,
) do subproblem::JuMP.Model, stage::Int
@variable(subproblem, x >= 0, SDDP.State, initial_value = 0)
if stage == 1
@variable(subproblem, u_make >= 0)
@constraint(subproblem, x.out == x.in + u_make)
@stageobjective(subproblem, -2 * u_make)
else
@variable(subproblem, u_sell >= 0)
@constraint(subproblem, u_sell <= x.in)
@constraint(subproblem, x.out == x.in - u_sell)
SDDP.parameterize(subproblem, d, P) do ω
set_upper_bound(u_sell, ω)
return
end
@stageobjective(subproblem, 5 * u_sell - 0.1 * x.out)
end
return
end
SDDP.train(model; log_every_iteration = true)
-------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
nodes : 2
state variables : 1
scenarios : 1.00000e+02
existing cuts : false
options
solver : serial mode
risk measure : SDDP.Expectation()
sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
VariableRef : [4, 4]
AffExpr in MOI.EqualTo{Float64} : [1, 1]
AffExpr in MOI.LessThan{Float64} : [1, 1]
VariableRef in MOI.GreaterThan{Float64} : [2, 3]
VariableRef in MOI.LessThan{Float64} : [1, 1]
numerical stability report
matrix range [1e+00, 1e+00]
objective range [1e-01, 5e+00]
bounds range [2e+02, 1e+03]
rhs range [0e+00, 0e+00]
-------------------------------------------------------------------
iteration simulation bound time (s) solves pid
-------------------------------------------------------------------
1 0.000000e+00 7.386181e+02 6.869078e-03 103 1
2 6.690344e+02 5.950457e+02 2.384591e-02 406 1
3 5.737139e+02 5.570313e+02 2.837992e-02 509 1
4 6.493521e+02 5.526121e+02 3.310800e-02 612 1
5 4.810123e+02 5.512187e+02 3.772092e-02 715 1
6 5.643656e+02 5.506318e+02 4.231191e-02 818 1
7 5.396287e+02 5.506237e+02 4.694200e-02 921 1
8 4.991527e+02 5.506205e+02 5.152488e-02 1024 1
9 4.816118e+02 5.506199e+02 5.613708e-02 1127 1
10 3.783853e+02 5.506199e+02 6.058788e-02 1230 1
11 6.052193e+02 5.506199e+02 6.533289e-02 1333 1
12 6.052193e+02 5.506199e+02 7.009602e-02 1436 1
13 5.068713e+02 5.506199e+02 7.484293e-02 1539 1
14 3.887009e+02 5.506199e+02 7.966089e-02 1642 1
15 5.128989e+02 5.506199e+02 8.431005e-02 1745 1
16 6.052193e+02 5.506199e+02 8.896303e-02 1848 1
17 4.999337e+02 5.506199e+02 9.356093e-02 1951 1
18 6.016802e+02 5.506199e+02 9.822488e-02 2054 1
19 6.052193e+02 5.506199e+02 1.029460e-01 2157 1
20 6.052193e+02 5.506199e+02 1.076069e-01 2260 1
21 6.052193e+02 5.506199e+02 2.131310e-01 2563 1
22 4.501271e+02 5.506199e+02 2.180140e-01 2666 1
23 4.605496e+02 5.506199e+02 2.229130e-01 2769 1
24 6.052193e+02 5.506199e+02 2.277739e-01 2872 1
25 6.052193e+02 5.506199e+02 2.326050e-01 2975 1
26 4.164092e+02 5.506199e+02 2.375159e-01 3078 1
27 5.690999e+02 5.506199e+02 2.424030e-01 3181 1
28 6.052193e+02 5.506199e+02 2.472420e-01 3284 1
29 5.034760e+02 5.506199e+02 2.521310e-01 3387 1
30 5.068713e+02 5.506199e+02 2.569940e-01 3490 1
31 6.052193e+02 5.506199e+02 2.618549e-01 3593 1
32 4.605496e+02 5.506199e+02 2.667220e-01 3696 1
33 4.972401e+02 5.506199e+02 2.716360e-01 3799 1
34 4.916576e+02 5.506199e+02 2.765329e-01 3902 1
35 4.290435e+02 5.506199e+02 2.813950e-01 4005 1
36 6.052193e+02 5.506199e+02 2.862830e-01 4108 1
37 6.052193e+02 5.506199e+02 2.912219e-01 4211 1
38 4.982571e+02 5.506199e+02 2.961459e-01 4314 1
39 6.052193e+02 5.506199e+02 3.010740e-01 4417 1
40 5.690999e+02 5.506199e+02 3.059881e-01 4520 1
-------------------------------------------------------------------
status : simulation_stopping
total time (s) : 3.059881e-01
total solves : 4520
best bound : 5.506199e+02
simulation ci : 5.266643e+02 ± 3.485202e+01
numeric issues : 0
-------------------------------------------------------------------
One way to query the optimal policy is with SDDP.DecisionRule
:
first_stage_rule = SDDP.DecisionRule(model; node = 1)
A decision rule for node 1
solution_1 = SDDP.evaluate(first_stage_rule; incoming_state = Dict(:x => 0.0))
(stage_objective = -403.4795433645594, outgoing_state = Dict(:x => 201.7397716822797), controls = Dict{Any, Any}())
Here's the second stage:
second_stage_rule = SDDP.DecisionRule(model; node = 2)
solution = SDDP.evaluate(
second_stage_rule;
incoming_state = Dict(:x => solution_1.outgoing_state[:x]),
noise = 170.0, # A value of d[ω], can be out-of-sample.
controls_to_record = [:u_sell],
)
(stage_objective = 846.826022831772, outgoing_state = Dict(:x => 31.739771682279695), controls = Dict(:u_sell => 170.0))
Simulation
Querying the decision rules is tedious. It's often more useful to simulate the policy:
simulations = SDDP.simulate(
model,
10, #= number of replications =#
[:x, :u_sell, :u_make]; #= variables to record =#
skip_undefined_variables = true,
);
simulations
is a vector with 10 elements
length(simulations)
10
and each element is a vector with two elements (one for each stage)
length(simulations[1])
2
The first stage contains:
simulations[1][1]
Dict{Symbol, Any} with 9 entries:
:u_make => 201.74
:bellman_term => 954.099
:noise_term => nothing
:node_index => 1
:stage_objective => -403.48
:objective_state => nothing
:u_sell => NaN
:belief => Dict(1=>1.0)
:x => State{Float64}(0.0, 201.74)
The second stage contains:
simulations[1][2]
Dict{Symbol, Any} with 9 entries:
:u_make => NaN
:bellman_term => 0.0
:noise_term => 231.002
:node_index => 2
:stage_objective => 1008.7
:objective_state => nothing
:u_sell => 201.74
:belief => Dict(2=>1.0)
:x => State{Float64}(201.74, 0.0)
We can compute aggregated statistics across the simulations:
objectives = map(simulations) do simulation
return sum(data[:stage_objective] for data in simulation)
end
μ, t = SDDP.confidence_interval(objectives)
println("Simulation ci : $μ ± $t")
Simulation ci : 552.0371333407535 ± 34.05008403382906
Risk aversion revisited
SDDP.jl contains a number of risk measures. One example is:
0.5 * SDDP.Expectation() + 0.5 * SDDP.WorstCase()
A convex combination of 0.5 * SDDP.Expectation() + 0.5 * SDDP.WorstCase()
You can construct a risk-averse policy by passing a risk measure to the risk_measure
keyword argument of SDDP.train
.
We can explore how the optimal decision changes with risk by creating a function:
function solve_newsvendor(risk_measure::SDDP.AbstractRiskMeasure)
model = SDDP.LinearPolicyGraph(;
stages = 2,
sense = :Max,
upper_bound = 5 * maximum(d),
optimizer = HiGHS.Optimizer,
) do subproblem, node
@variable(subproblem, x >= 0, SDDP.State, initial_value = 0)
if node == 1
@stageobjective(subproblem, -2 * x.out)
else
@variable(subproblem, u_sell >= 0)
@constraint(subproblem, u_sell <= x.in)
@constraint(subproblem, x.out == x.in - u_sell)
SDDP.parameterize(subproblem, d, P) do ω
set_upper_bound(u_sell, ω)
return
end
@stageobjective(subproblem, 5 * u_sell - 0.1 * x.out)
end
return
end
SDDP.train(model; risk_measure = risk_measure, print_level = 0)
first_stage_rule = SDDP.DecisionRule(model; node = 1)
solution = SDDP.evaluate(first_stage_rule; incoming_state = Dict(:x => 0.0))
return solution.outgoing_state[:x]
end
solve_newsvendor (generic function with 1 method)
Now we can see how many units a decision maker would order using CVaR
:
solve_newsvendor(SDDP.CVaR(0.4))
181.79009712282794
as well as a decision-maker who cares only about the worst-case outcome:
solve_newsvendor(SDDP.WorstCase())
154.3436281578693
In general, the decision-maker will be somewhere between the two extremes. The SDDP.Entropic
risk measure is a risk measure that has a single parameter that lets us explore the space of policies between the two extremes. When the parameter is small, the measure acts like SDDP.Expectation
, and when it is large, it acts like SDDP.WorstCase
.
Here is what we get if we solve our problem multiple times for different values of the risk aversion parameter $\gamma$:
Γ = [10^i for i in -4:0.5:1]
buy = [solve_newsvendor(SDDP.Entropic(γ)) for γ in Γ]
Plots.plot(
Γ,
buy;
xaxis = :log,
xlabel = "Risk aversion parameter γ",
ylabel = "Number of pies to make",
legend = false,
)
Things to try
There are a number of things you can try next:
- Experiment with different buy and sales prices
- Experiment with different distributions of demand
- Explore how the optimal policy changes if you use a different risk measure
- What happens if you can only buy and sell integer numbers of newspapers? Try this by adding
Int
to the variable definitions:@variable(subproblem, buy >= 0, Int)
- What happens if you use a different upper bound? Try an invalid one like
-100
, and a very large one like1e12
.