Example: inventory management

This tutorial was generated using Literate.jl. Download the source as a .jl file. Download the source as a .ipynb file.

The purpose of this tutorial is to demonstrate a well-known inventory management problem with a finite- and infinite-horizon policy.

Required packages

This tutorial requires the following packages:

using SDDP
import Distributions
import HiGHS
import Plots
import Statistics

Background

Consider a periodic review inventory problem involving a single product. The initial inventory is denoted by $x_0 \geq 0$, and a decision-maker can place an order at the start of each stage. The objective is to minimize expected costs over the planning horizon. The following parameters define the cost structure:

  • c is the unit cost for purchasing each unit
  • h is the holding cost per unit remaining at the end of each stage
  • p is the shortage cost per unit of unsatisfied demand at the end of each stage

There are no fixed ordering costs, and the demand at each stage is assumed to follow an independent and identically distributed random variable with cumulative distribution function (CDF) $\Phi(\cdot)$. Any unsatisfied demand is backlogged and carried forward to the next stage.

At each stage, an agent must decide how many items to order. The per-stage costs are the sum of the order costs, shortage and holding costs incurred at the end of the stage, after demand is realized.

Following Chapter 19 of Introduction to Operations Research by Hillier and Lieberman (7th edition), we use the following parameters: $c=15, h=1, p=15$.

x_0 = 10        # initial inventory
c = 35          # unit inventory cost
h = 1           # unit inventory holding cost
p = 15          # unit order cost
15

Demand follows a continuous uniform distribution between 0 and 800. We construct a sample average approximation with 20 scenarios:

Ω = range(0, 800; length = 20);

This is a well-known inventory problem with a closed-form solution. The optimal policy is a simple order-up-to policy: if the inventory level is below a certain number of units, the decision-maker orders up to that number of units. Otherwise, no order is placed. For a detailed analysis, refer to Foundations of Stochastic Inventory Theory by Evan Porteus (Stanford Business Books, 2002).

Finite horizon

For a finite horizon of length $T$, the problem is to minimize the total expected cost over all stages.

In the last stage, the decision-maker can recover the unit cost c for each leftover item, or buy out any remaining backlog, also at the unit cost c.

T = 10 # number of stages
model = SDDP.LinearPolicyGraph(;
    stages = T + 1,
    sense = :Min,
    lower_bound = 0.0,
    optimizer = HiGHS.Optimizer,
) do sp, t
    @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x_0)
    @variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
    # u_buy is a Decision-Hazard control variable. We decide u.out for use in
    # the next stage
    @variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
    @variable(sp, u_sell >= 0)
    @variable(sp, w_demand == 0)
    @constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
    @constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
    if t == 1
        fix(u_sell, 0; force = true)
        @stageobjective(sp, c * u_buy.out)
    elseif t == T + 1
        fix(u_buy.out, 0; force = true)
        @stageobjective(sp, -c * x_inventory.out + c * x_demand.out)
        SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
    else
        @stageobjective(sp, c * u_buy.out + h * x_inventory.out + p * x_demand.out)
        SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
    end
    return
end
A policy graph with 11 nodes.
 Node indices: 1, ..., 11

Train and simulate the policy:

SDDP.train(model)
simulations = SDDP.simulate(model, 200, [:x_inventory, :u_buy])
objective_values = [sum(t[:stage_objective] for t in s) for s in simulations]
μ, ci = round.(SDDP.confidence_interval(objective_values, 1.96); digits = 2)
lower_bound = round(SDDP.calculate_bound(model); digits = 2)
println("Confidence interval: ", μ, " ± ", ci)
println("Lower bound: ", lower_bound)
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
  nodes           : 11
  state variables : 3
  scenarios       : 1.02400e+13
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [9, 9]
  AffExpr in MOI.EqualTo{Float64}         : [2, 2]
  VariableRef in MOI.EqualTo{Float64}     : [1, 2]
  VariableRef in MOI.GreaterThan{Float64} : [4, 5]
  VariableRef in MOI.LessThan{Float64}    : [1, 1]
numerical stability report
  matrix range     [1e+00, 1e+00]
  objective range  [1e+00, 4e+01]
  bounds range     [0e+00, 0e+00]
  rhs range        [0e+00, 0e+00]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   2.978789e+05  4.573582e+04  1.863194e-02       212   1
        55   1.510329e+05  1.443366e+05  1.024308e+00     14960   1
       111   1.425133e+05  1.443372e+05  2.034434e+00     29032   1
       170   1.270605e+05  1.443373e+05  3.036721e+00     41540   1
       224   1.412079e+05  1.443373e+05  4.037158e+00     54088   1
       277   1.364076e+05  1.443374e+05  5.046625e+00     65324   1
       287   1.155237e+05  1.443374e+05  5.207115e+00     67444   1
-------------------------------------------------------------------
status         : simulation_stopping
total time (s) : 5.207115e+00
total solves   : 67444
best bound     :  1.443374e+05
simulation ci  :  1.432385e+05 ± 3.406740e+03
numeric issues : 0
-------------------------------------------------------------------

Confidence interval: 143168.33 ± 3657.55
Lower bound: 144337.43

Plot the optimal inventory levels:

plt = SDDP.publication_plot(
    simulations;
    title = "x_inventory.out + u_buy.out",
    xlabel = "Stage",
    ylabel = "Quantity",
    ylims = (0, 1_000),
) do data
    return data[:x_inventory].out + data[:u_buy].out
end
Example block output

In the early stages, we indeed recover an order-up-to policy. However, there are end-of-horizon effects as the agent tries to optimize their decision making knowing that they have 10 realizations of demand.

Infinite horizon

We can remove the end-of-horizonn effects by considering an infinite horizon model. We assume a discount factor $\alpha=0.95$:

α = 0.95
graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 2, α)
graph
Root
 0
Nodes
 1
 2
Arcs
 0 => 1 w.p. 1.0
 1 => 2 w.p. 1.0
 2 => 2 w.p. 0.95

The objective in this case is to minimize the discounted expected costs over an infinite planning horizon.

model = SDDP.PolicyGraph(
    graph;
    sense = :Min,
    lower_bound = 0.0,
    optimizer = HiGHS.Optimizer,
) do sp, t
    @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x_0)
    @variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
    # u_buy is a Decision-Hazard control variable. We decide u.out for use in
    # the next stage
    @variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
    @variable(sp, u_sell >= 0)
    @variable(sp, w_demand == 0)
    @constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
    @constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
    if t == 1
        fix(u_sell, 0; force = true)
        @stageobjective(sp, c * u_buy.out)
    else
        @stageobjective(sp, c * u_buy.out + h * x_inventory.out + p * x_demand.out)
        SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
    end
    return
end

SDDP.train(model; iteration_limit = 400)
simulations = SDDP.simulate(
    model,
    200,
    [:x_inventory, :u_buy];
    sampling_scheme = SDDP.InSampleMonteCarlo(;
        max_depth = 50,
        terminate_on_dummy_leaf = false,
    ),
);
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
  nodes           : 2
  state variables : 3
  scenarios       : Inf
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [9, 9]
  AffExpr in MOI.EqualTo{Float64}         : [2, 2]
  VariableRef in MOI.EqualTo{Float64}     : [1, 2]
  VariableRef in MOI.GreaterThan{Float64} : [4, 5]
numerical stability report
  matrix range     [1e+00, 1e+00]
  objective range  [1e+00, 4e+01]
  bounds range     [0e+00, 0e+00]
  rhs range        [0e+00, 0e+00]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   9.539921e+05  4.703021e+04  2.503514e-02       421   1
        32   4.348587e+05  3.026103e+05  1.065272e+00     13934   1
        62   1.205951e+05  3.124528e+05  2.086895e+00     25787   1
        83   3.447322e+05  3.126376e+05  3.121592e+00     36749   1
       105   6.159746e+05  3.126617e+05  4.193316e+00     46305   1
       119   2.270174e+05  3.126647e+05  5.222419e+00     55202   1
       140   3.193974e+05  3.126650e+05  6.261309e+00     63497   1
       155   7.980500e+05  3.126650e+05  7.419313e+00     72017   1
       171   1.473553e+05  3.126650e+05  8.447344e+00     78690   1
       179   1.639987e+06  3.126650e+05  9.560351e+00     85586   1
       241   2.386395e+05  3.126650e+05  1.461447e+01    111079   1
       281   6.387237e+05  3.126650e+05  1.962103e+01    129977   1
       316   4.268500e+05  3.126650e+05  2.472164e+01    144208   1
       343   7.679026e+05  3.126650e+05  3.000793e+01    156142   1
       361   4.584289e+05  3.126650e+05  3.521899e+01    165358   1
       385   1.988079e+05  3.126650e+05  4.039679e+01    173677   1
       400   2.265132e+05  3.126650e+05  4.505736e+01    180223   1
-------------------------------------------------------------------
status         : iteration_limit
total time (s) : 4.505736e+01
total solves   : 180223
best bound     :  3.126650e+05
simulation ci  :  3.266089e+05 ± 2.743987e+04
numeric issues : 0
-------------------------------------------------------------------

Plot the optimal inventory levels:

plt = SDDP.publication_plot(
    simulations;
    title = "x_inventory.out + u_buy.out",
    xlabel = "Stage",
    ylabel = "Quantity",
    ylims = (0, 1_000),
) do data
    return data[:x_inventory].out + data[:u_buy].out
end
Plots.hline!(plt, [662]; label = "Analytic solution")
Example block output

We again recover an order-up-to policy. The analytic solution is to order-up-to 662 units. We do not precisely recover this solution because we used a sample average approximation of 20 elements. If we increased the number of samples, our solution would approach the analytic solution.