Hydro-thermal scheduling

This tutorial was generated using Literate.jl. Download the source as a .jl file. Download the source as a .ipynb file.

Problem Description

In a hydro-thermal problem, the agent controls a hydro-electric generator and reservoir. Each time period, they need to choose a generation quantity from thermal g_t, and hydro g_h, in order to meet demand w_d, which is a stagewise-independent random variable. The state variable, x, is the quantity of water in the reservoir at the start of each time period, and it has a minimum level of 5 units and a maximum level of 15 units. We assume that there are 10 units of water in the reservoir at the start of time, so that x_0 = 10. The state-variable is connected through time by the water balance constraint: x.out = x.in - g_h - s + w_i, where x.out is the quantity of water at the end of the time period, x.in is the quantity of water at the start of the time period, s is the quantity of water spilled from the reservoir, and w_i is a stagewise-independent random variable that represents the inflow into the reservoir during the time period.

We assume that there are three stages, t=1, 2, 3, representing summer-fall, winter, and spring, and that we are solving this problem in an infinite-horizon setting with a discount factor of 0.95.

In each stage, the agent incurs the cost of spillage, plus the cost of thermal generation. We assume that the cost of thermal generation is dependent on the stage t = 1, 2, 3, and that in each stage, w is drawn from the set (w_i, w_d) = {(0, 7.5), (3, 5), (10, 2.5)} with equal probability.

Importing packages

For this example, in addition to SDDP, we need HiGHS as a solver and Statisitics to compute the mean of our simulations.

using HiGHS
using SDDP
using Statistics

Constructing the policy graph

There are three stages in our infinite-horizon problem, so we construct a unicyclic policy graph using SDDP.UnicyclicGraph:

graph = SDDP.UnicyclicGraph(0.95; num_nodes = 3)
Root
 0
Nodes
 1
 2
 3
Arcs
 0 => 1 w.p. 1.0
 1 => 2 w.p. 1.0
 2 => 3 w.p. 1.0
 3 => 1 w.p. 0.95

Constructing the model

Much of the macro code (i.e., lines starting with @) in the first part of the following should be familiar to users of JuMP.

Inside the do-end block, sp is a standard JuMP model, and t is an index for the state variable that will be called with t = 1, 2, 3.

The state variable x, constructed by passing the SDDP.State tag to @variable is actually a Julia struct with two fields: x.in and x.out corresponding to the incoming and outgoing state variables respectively. Both x.in and x.out are standard JuMP variables. The initial_value keyword provides the value of the state variable in the root node (i.e., x_0).

Compared to a JuMP model, one key difference is that we use @stageobjective instead of @objective. The SDDP.parameterize function takes a list of supports for w and parameterizes the JuMP model sp by setting the right-hand sides of the appropriate constraints (note how the constraints initially have a right-hand side of 0). By default, it is assumed that the realizations have uniform probability, but a probability mass vector can also be provided.

model = SDDP.PolicyGraph(
    graph;
    sense = :Min,
    lower_bound = 0.0,
    optimizer = HiGHS.Optimizer,
) do sp, t
    @variable(sp, 5 <= x <= 15, SDDP.State, initial_value = 10)
    @variable(sp, g_t >= 0)
    @variable(sp, g_h >= 0)
    @variable(sp, s >= 0)
    @constraint(sp, balance, x.out - x.in + g_h + s == 0)
    @constraint(sp, demand, g_h + g_t == 0)
    @stageobjective(sp, s + t * g_t)
    SDDP.parameterize(sp, [[0, 7.5], [3, 5], [10, 2.5]]) do w
        set_normalized_rhs(balance, w[1])
        return set_normalized_rhs(demand, w[2])
    end
end
A policy graph with 3 nodes.
 Node indices: 1, 2, 3

Training the policy

Once a model has been constructed, the next step is to train the policy. This can be achieved using SDDP.train. There are many options that can be passed, but iteration_limit terminates the training after the prescribed number of SDDP iterations.

SDDP.train(model; iteration_limit = 100)
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-24
-------------------------------------------------------------------
problem
  nodes           : 3
  state variables : 1
  scenarios       : Inf
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [6, 6]
  AffExpr in MOI.EqualTo{Float64}         : [2, 2]
  VariableRef in MOI.GreaterThan{Float64} : [5, 5]
  VariableRef in MOI.LessThan{Float64}    : [1, 1]
numerical stability report
  matrix range     [1e+00, 1e+00]
  objective range  [1e+00, 3e+00]
  bounds range     [5e+00, 2e+01]
  rhs range        [2e+00, 1e+01]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   2.390000e+02  6.304440e+01  1.103470e-01       183   1
        31   8.517170e+02  2.346450e+02  1.196832e+00      7701   1
        50   5.832977e+02  2.361584e+02  2.304230e+00     13230   1
        63   8.928468e+02  2.363625e+02  3.317252e+00     17421   1
        76   8.600261e+02  2.364145e+02  4.529045e+00     22092   1
        89   2.365015e+02  2.364263e+02  5.551026e+00     24987   1
       100   1.135002e+02  2.364293e+02  6.229514e+00     26640   1
-------------------------------------------------------------------
status         : iteration_limit
total time (s) : 6.229514e+00
total solves   : 26640
best bound     :  2.364293e+02
simulation ci  :  2.593398e+02 ± 5.186931e+01
numeric issues : 0
-------------------------------------------------------------------

Simulating the policy

After training, we can simulate the policy using SDDP.simulate.

sims = SDDP.simulate(model, 100, [:g_t])
mu = round(mean([s[1][:g_t] for s in sims]); digits = 2)
println("On average, $(mu) units of thermal are used in the first stage.")
On average, 1.71 units of thermal are used in the first stage.

Extracting the water values

Finally, we can use SDDP.ValueFunction and SDDP.evaluate to obtain and evaluate the value function at different points in the state-space. Note that since we are minimizing, the price has a negative sign: each additional unit of water leads to a decrease in the expected long-run cost.

V = SDDP.ValueFunction(model[1])
cost, price = SDDP.evaluate(V; x = 10)
(233.55074662683333, Dict(:x => -0.6602685305287201))