Example: the milk producer
This tutorial was generated using Literate.jl. Download the source as a .jl
file. Download the source as a .ipynb
file.
The purpose of this tutorial is to demonstrate how to fit a Markovian policy graph to a univariate stochastic process.
This tutorial uses the following packages:
using SDDP
import HiGHS
import Plots
Background
A company produces milk for sale on a spot market each month. The quantity of milk they produce is uncertain, and so too is the price on the spot market. The company can store unsold milk in a stockpile of dried milk powder.
The spot price is determined by an auction system, and so varies from month to month, but demonstrates serial correlation. In each auction, there is sufficient demand that the milk producer finds a buyer for all their milk, regardless of the quantity they supply. Furthermore, the spot price is independent of the milk producer (they are a small player in the market).
The spot price is highly volatile, and is the result of a process that is out of the control of the company. To counteract their price risk, the company engages in a forward contracting programme.
The forward contracting programme is a deal for physical milk four months in the future.
The futures price is the current spot price, plus some forward contango (the buyers gain certainty that they will receive the milk in the future).
In general, the milk company should forward contract (since they reduce their price risk), however they also have production risk. Therefore, it may be the case that they forward contract a fixed amount, but find that they do not produce enough milk to meet the fixed demand. They are then forced to buy additional milk on the spot market.
The goal of the milk company is to choose the extent to which they forward contract in order to maximise (risk-adjusted) revenues, whilst managing their production risk.
A stochastic process for price
It is outside the scope of this tutorial, but assume that we have gone away and analysed historical data to fit a stochastic process to the sequence of monthly auction spot prices.
One plausible model is a multiplicative auto-regressive model of order one, where the white noise term is modeled by a finite distribution of empirical residuals. We can simulate this stochastic process as follows:
function simulator()
residuals = [0.0987, 0.199, 0.303, 0.412, 0.530, 0.661, 0.814, 1.010, 1.290]
residuals = 0.1 * vcat(-residuals, 0.0, residuals)
scenario = zeros(12)
y, μ, α = 4.5, 6.0, 0.05
for t in 1:12
y = exp((1 - α) * log(y) + α * log(μ) + rand(residuals))
scenario[t] = clamp(y, 3.0, 9.0)
end
return scenario
end
simulator()
12-element Vector{Float64}:
4.757210658651313
4.812739183470713
4.2771677605526115
4.220334849839149
3.8826065642988654
3.7631988749292113
4.178673558863071
3.846186530784705
3.8551734985025945
3.6892868911874315
3.8175906325671107
3.904877831567132
It may be helpful to visualize a number of simulations of the price process:
plot = Plots.plot(
[simulator() for _ in 1:500];
color = "gray",
opacity = 0.2,
legend = false,
xlabel = "Month",
ylabel = "Price [\$/kg]",
xlims = (1, 12),
ylims = (3, 9),
)
The prices gradually revert to the mean of $6/kg, and there is high volatility.
We can't incorporate this price process directly into SDDP.jl, but we can fit a SDDP.MarkovianGraph
directly from the simulator:
graph = SDDP.MarkovianGraph(simulator; budget = 30, scenarios = 10_000);
Here budget
is the number of nodes in the policy graph, and scenarios
is the number of simulations to use when estimating the transition probabilities.
The graph contains too many nodes to be show, but we can plot it:
for ((t, price), edges) in graph.nodes
for ((t′, price′), probability) in edges
Plots.plot!(
plot,
[t, t′],
[price, price′];
color = "red",
width = 3 * probability,
)
end
end
plot
That looks okay. Try changing budget
and scenarios
to see how different Markovian policy graphs can be created.
Model
Now that we have a Markovian graph, we can build the model. See if you can work out how we arrived at this formulation by reading the background description. Do all the variables and constraints make sense?
model = SDDP.PolicyGraph(
graph;
sense = :Max,
upper_bound = 1e2,
optimizer = HiGHS.Optimizer,
) do sp, node
# Decompose the node into the month (::Int) and spot price (::Float64)
t, price = node::Tuple{Int,Float64}
# Transactions on the futures market cost 0.01
c_transaction = 0.01
# It costs the company +50% to buy milk on the spot market and deliver to
# their customers
c_buy_premium = 1.5
# Buyer is willing to pay +5% for certainty
c_contango = 1.05
# Distribution of production
Ω_production = range(0.1, 0.2; length = 5)
c_max_production = 12 * maximum(Ω_production)
# x_stock: quantity of milk in stock pile
@variable(sp, 0 <= x_stock, SDDP.State, initial_value = 0)
# x_forward[i]: quantity of milk for delivery in i months
@variable(sp, 0 <= x_forward[1:4], SDDP.State, initial_value = 0)
# u_spot_sell: quantity of milk to sell on spot market
@variable(sp, 0 <= u_spot_sell <= c_max_production)
# u_spot_buy: quantity of milk to buy on spot market
@variable(sp, 0 <= u_spot_buy <= c_max_production)
# u_spot_buy: quantity of milk to sell on futures market
c_max_futures = t <= 8 ? c_max_production : 0.0
@variable(sp, 0 <= u_forward_sell <= c_max_futures)
# ω_production: production random variable
@variable(sp, ω_production)
# Forward contracting constraints:
@constraint(sp, [i in 1:3], x_forward[i].out == x_forward[i+1].in)
@constraint(sp, x_forward[4].out == u_forward_sell)
# Stockpile balance constraint
@constraint(
sp,
x_stock.out ==
x_stock.in + ω_production + u_spot_buy - x_forward[1].in - u_spot_sell
)
# The random variables. `price` comes from the Markov node
#
# !!! warning
# The elements in Ω MUST be a tuple with 1 or 2 values, where the first
# value is `price` and the second value is the random variable for the
# current node. If the node is deterministic, use Ω = [(price,)].
Ω = [(price, p) for p in Ω_production]
SDDP.parameterize(sp, Ω) do ω
# Fix the ω_production variable
fix(ω_production, ω[2])
@stageobjective(
sp,
# Sales on spot market
ω[1] * (u_spot_sell - c_buy_premium * u_spot_buy) +
# Sales on futures smarket
(ω[1] * c_contango - c_transaction) * u_forward_sell
)
return
end
return
end
A policy graph with 30 nodes.
Node indices: (1, 4.591316866263163), ..., (12, 7.688961207215459)
Training a policy
Now we have a model, we train a policy. The SDDP.SimulatorSamplingScheme
is used in the forward pass. It generates an out-of-sample sequence of prices using simulator
and traverses the closest sequence of nodes in the policy graph. When calling SDDP.parameterize
for each subproblem, it uses the new out-of-sample price instead of the price associated with the Markov node.
SDDP.train(
model;
time_limit = 20,
risk_measure = 0.5 * SDDP.Expectation() + 0.5 * SDDP.AVaR(0.25),
sampling_scheme = SDDP.SimulatorSamplingScheme(simulator),
)
-------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
nodes : 30
state variables : 5
scenarios : 9.27734e+11
existing cuts : false
options
solver : serial mode
risk measure : A convex combination of 0.5 * SDDP.Expectation() + 0.5 * SDDP.AVaR(0.25)
sampling scheme : SDDP.SimulatorSamplingScheme{typeof(Main.simulator)}
subproblem structure
VariableRef : [15, 15]
AffExpr in MOI.EqualTo{Float64} : [5, 5]
VariableRef in MOI.EqualTo{Float64} : [1, 1]
VariableRef in MOI.GreaterThan{Float64} : [8, 9]
VariableRef in MOI.LessThan{Float64} : [4, 4]
numerical stability report
matrix range [1e+00, 1e+00]
objective range [1e+00, 1e+01]
bounds range [2e+00, 1e+02]
rhs range [0e+00, 0e+00]
-------------------------------------------------------------------
iteration simulation bound time (s) solves pid
-------------------------------------------------------------------
1 -4.522697e+01 5.958917e+01 1.292260e+00 162 1
62 7.621329e+00 7.853611e+00 2.299355e+00 10044 1
112 6.571401e+00 7.845468e+00 3.305268e+00 18144 1
155 1.093836e+01 7.845016e+00 4.325593e+00 25110 1
198 1.057987e+01 7.845016e+00 5.331395e+00 32076 1
238 1.057996e+01 7.844513e+00 6.337444e+00 38556 1
274 9.233738e+00 7.842956e+00 7.377389e+00 44388 1
309 8.271896e+00 7.842819e+00 8.383133e+00 50058 1
344 1.000973e+01 7.842804e+00 9.407188e+00 55728 1
492 8.029274e+00 7.842439e+00 1.442071e+01 79704 1
624 8.926477e+00 7.842136e+00 1.945240e+01 101088 1
638 9.684737e+00 7.842119e+00 2.003837e+01 103356 1
-------------------------------------------------------------------
status : time_limit
total time (s) : 2.003837e+01
total solves : 103356
best bound : 7.842119e+00
simulation ci : 8.846850e+00 ± 3.447959e-01
numeric issues : 0
-------------------------------------------------------------------
We're intentionally terminating the training early so that the documentation doesn't take too long to build. If you run this example locally, increase the time limit.
Simulating the policy
When simulating the policy, we can also use the SDDP.SimulatorSamplingScheme
.
simulations = SDDP.simulate(
model,
200,
Symbol[:x_stock, :u_forward_sell, :u_spot_sell, :u_spot_buy];
sampling_scheme = SDDP.SimulatorSamplingScheme(simulator),
);
To show how the sampling scheme uses the new out-of-sample price instead of the price associated with the Markov node, compare the index of the Markov state visited in stage 12 of the first simulation:
simulations[1][12][:node_index]
(12, 5.32780734618073)
to the realization of the noise (price, ω)
passed to SDDP.parameterize
:
simulations[1][12][:noise_term]
(5.386472071269592, 0.175)
Visualizing the policy
Finally, we can plot the policy to gain insight (although note that we terminated the training early, so we should run the re-train the policy for more iterations before making too many judgements).
plot = Plots.plot(
SDDP.publication_plot(simulations; title = "x_stock.out") do data
return data[:x_stock].out
end,
SDDP.publication_plot(simulations; title = "u_forward_sell") do data
return data[:u_forward_sell]
end,
SDDP.publication_plot(simulations; title = "u_spot_buy") do data
return data[:u_spot_buy]
end,
SDDP.publication_plot(simulations; title = "u_spot_sell") do data
return data[:u_spot_sell]
end;
layout = (2, 2),
)
Next steps
- Train the policy for longer. What do you observe?
- Try creating different Markovian graphs. What happens if you add more nodes?
- Try different risk measures