Example: inventory management
This tutorial was generated using Literate.jl. Download the source as a .jl
file. Download the source as a .ipynb
file.
The purpose of this tutorial is to demonstrate a well-known inventory management problem with a finite- and infinite-horizon policy.
Required packages
This tutorial requires the following packages:
using SDDP
import Distributions
import HiGHS
import Plots
import Statistics
Background
Consider a periodic review inventory problem involving a single product. The initial inventory is denoted by $x_0 \geq 0$, and a decision-maker can place an order at the start of each stage. The objective is to minimize expected costs over the planning horizon. The following parameters define the cost structure:
c
is the unit cost for purchasing each unith
is the holding cost per unit remaining at the end of each stagep
is the shortage cost per unit of unsatisfied demand at the end of each stage
There are no fixed ordering costs, and the demand at each stage is assumed to follow an independent and identically distributed random variable with cumulative distribution function (CDF) $\Phi(\cdot)$. Any unsatisfied demand is backlogged and carried forward to the next stage.
At each stage, an agent must decide how many items to order. The per-stage costs are the sum of the order costs, shortage and holding costs incurred at the end of the stage, after demand is realized.
Following Chapter 19 of Introduction to Operations Research by Hillier and Lieberman (7th edition), we use the following parameters: $c=15, h=1, p=15$.
x_0 = 10 # initial inventory
c = 35 # unit inventory cost
h = 1 # unit inventory holding cost
p = 15 # unit order cost
15
Demand follows a continuous uniform distribution between 0 and 800. We construct a sample average approximation with 20 scenarios:
Ω = range(0, 800; length = 20);
This is a well-known inventory problem with a closed-form solution. The optimal policy is a simple order-up-to policy: if the inventory level is below a certain number of units, the decision-maker orders up to that number of units. Otherwise, no order is placed. For a detailed analysis, refer to Foundations of Stochastic Inventory Theory by Evan Porteus (Stanford Business Books, 2002).
Finite horizon
For a finite horizon of length $T$, the problem is to minimize the total expected cost over all stages.
In the last stage, the decision-maker can recover the unit cost c
for each leftover item, or buy out any remaining backlog, also at the unit cost c
.
T = 10 # number of stages
model = SDDP.LinearPolicyGraph(;
stages = T + 1,
sense = :Min,
lower_bound = 0.0,
optimizer = HiGHS.Optimizer,
) do sp, t
@variable(sp, x_inventory >= 0, SDDP.State, initial_value = x_0)
@variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
# u_buy is a Decision-Hazard control variable. We decide u.out for use in
# the next stage
@variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
@variable(sp, u_sell >= 0)
@variable(sp, w_demand == 0)
@constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
@constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
if t == 1
fix(u_sell, 0; force = true)
@stageobjective(sp, c * u_buy.out)
elseif t == T + 1
fix(u_buy.out, 0; force = true)
@stageobjective(sp, -c * x_inventory.out + c * x_demand.out)
SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
else
@stageobjective(sp, c * u_buy.out + h * x_inventory.out + p * x_demand.out)
SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
end
return
end
A policy graph with 11 nodes.
Node indices: 1, ..., 11
Train and simulate the policy:
SDDP.train(model)
simulations = SDDP.simulate(model, 200, [:x_inventory, :u_buy])
objective_values = [sum(t[:stage_objective] for t in s) for s in simulations]
μ, ci = round.(SDDP.confidence_interval(objective_values, 1.96); digits = 2)
lower_bound = round(SDDP.calculate_bound(model); digits = 2)
println("Confidence interval: ", μ, " ± ", ci)
println("Lower bound: ", lower_bound)
-------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
nodes : 11
state variables : 3
scenarios : 1.02400e+13
existing cuts : false
options
solver : serial mode
risk measure : SDDP.Expectation()
sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
VariableRef : [9, 9]
AffExpr in MOI.EqualTo{Float64} : [2, 2]
VariableRef in MOI.EqualTo{Float64} : [1, 2]
VariableRef in MOI.GreaterThan{Float64} : [4, 5]
VariableRef in MOI.LessThan{Float64} : [1, 1]
numerical stability report
matrix range [1e+00, 1e+00]
objective range [1e+00, 4e+01]
bounds range [0e+00, 0e+00]
rhs range [0e+00, 0e+00]
-------------------------------------------------------------------
iteration simulation bound time (s) solves pid
-------------------------------------------------------------------
1 4.204053e+05 4.573582e+04 1.957083e-02 212 1
49 7.661882e+04 1.443325e+05 1.025186e+00 13688 1
104 1.732079e+05 1.443374e+05 2.038726e+00 27548 1
167 1.612921e+05 1.443374e+05 3.054173e+00 40904 1
216 1.221342e+05 1.443374e+05 4.068572e+00 52392 1
268 1.554395e+05 1.443374e+05 5.073598e+00 63416 1
288 1.193974e+05 1.443374e+05 5.513160e+00 67656 1
-------------------------------------------------------------------
status : simulation_stopping
total time (s) : 5.513160e+00
total solves : 67656
best bound : 1.443374e+05
simulation ci : 1.446489e+05 ± 4.035802e+03
numeric issues : 0
-------------------------------------------------------------------
Confidence interval: 145265.16 ± 3693.61
Lower bound: 144337.44
Plot the optimal inventory levels:
plt = SDDP.publication_plot(
simulations;
title = "x_inventory.out + u_buy.out",
xlabel = "Stage",
ylabel = "Quantity",
ylims = (0, 1_000),
) do data
return data[:x_inventory].out + data[:u_buy].out
end
In the early stages, we indeed recover an order-up-to policy. However, there are end-of-horizon effects as the agent tries to optimize their decision making knowing that they have 10 realizations of demand.
Infinite horizon
We can remove the end-of-horizonn effects by considering an infinite horizon model. We assume a discount factor $\alpha=0.95$:
α = 0.95
graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 2, α)
graph
Root
0
Nodes
1
2
Arcs
0 => 1 w.p. 1.0
1 => 2 w.p. 1.0
2 => 2 w.p. 0.95
The objective in this case is to minimize the discounted expected costs over an infinite planning horizon.
model = SDDP.PolicyGraph(
graph;
sense = :Min,
lower_bound = 0.0,
optimizer = HiGHS.Optimizer,
) do sp, t
@variable(sp, x_inventory >= 0, SDDP.State, initial_value = x_0)
@variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
# u_buy is a Decision-Hazard control variable. We decide u.out for use in
# the next stage
@variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
@variable(sp, u_sell >= 0)
@variable(sp, w_demand == 0)
@constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
@constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
if t == 1
fix(u_sell, 0; force = true)
@stageobjective(sp, c * u_buy.out)
else
@stageobjective(sp, c * u_buy.out + h * x_inventory.out + p * x_demand.out)
SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
end
return
end
SDDP.train(model; iteration_limit = 400)
simulations = SDDP.simulate(
model,
200,
[:x_inventory, :u_buy];
sampling_scheme = SDDP.InSampleMonteCarlo(;
max_depth = 50,
terminate_on_dummy_leaf = false,
),
);
-------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
nodes : 2
state variables : 3
scenarios : Inf
existing cuts : false
options
solver : serial mode
risk measure : SDDP.Expectation()
sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
VariableRef : [9, 9]
AffExpr in MOI.EqualTo{Float64} : [2, 2]
VariableRef in MOI.EqualTo{Float64} : [1, 2]
VariableRef in MOI.GreaterThan{Float64} : [4, 5]
numerical stability report
matrix range [1e+00, 1e+00]
objective range [1e+00, 4e+01]
bounds range [0e+00, 0e+00]
rhs range [0e+00, 0e+00]
-------------------------------------------------------------------
iteration simulation bound time (s) solves pid
-------------------------------------------------------------------
1 3.526658e+05 4.629033e+04 1.620293e-02 253 1
36 3.921498e+05 3.094975e+05 1.055014e+00 13623 1
62 5.602105e+05 3.125563e+05 2.119449e+00 25157 1
89 2.450831e+05 3.126590e+05 3.138672e+00 35075 1
120 6.157371e+05 3.126647e+05 4.221242e+00 44640 1
137 7.103450e+05 3.126650e+05 5.233542e+00 52595 1
157 4.508502e+05 3.126650e+05 6.322338e+00 61036 1
175 4.667237e+05 3.126650e+05 7.385480e+00 68698 1
201 1.069208e+06 3.126650e+05 8.556504e+00 76515 1
217 5.732079e+05 3.126650e+05 9.639910e+00 83356 1
287 2.872289e+05 3.126650e+05 1.464090e+01 108080 1
322 5.716921e+05 3.126650e+05 1.964648e+01 125188 1
353 7.687447e+05 3.126650e+05 2.501146e+01 137987 1
383 2.001553e+05 3.126650e+05 3.025984e+01 149840 1
400 1.310184e+05 3.126650e+05 3.326620e+01 156577 1
-------------------------------------------------------------------
status : iteration_limit
total time (s) : 3.326620e+01
total solves : 156577
best bound : 3.126650e+05
simulation ci : 2.801773e+05 ± 2.337096e+04
numeric issues : 0
-------------------------------------------------------------------
Plot the optimal inventory levels:
plt = SDDP.publication_plot(
simulations;
title = "x_inventory.out + u_buy.out",
xlabel = "Stage",
ylabel = "Quantity",
ylims = (0, 1_000),
) do data
return data[:x_inventory].out + data[:u_buy].out
end
Plots.hline!(plt, [662]; label = "Analytic solution")
We again recover an order-up-to policy. The analytic solution is to order-up-to 662 units. We do not precisely recover this solution because we used a sample average approximation of 20 elements. If we increased the number of samples, our solution would approach the analytic solution.