Objective states

This tutorial was generated using Literate.jl. Download the source as a .jl file. Download the source as a .ipynb file.

There are many applications in which we want to model a price process that follows some auto-regressive process. Common examples include stock prices on financial exchanges and spot-prices in energy markets.

However, it is well known that these cannot be incorporated in to SDDP because they result in cost-to-go functions that are convex with respect to some state variables (e.g., the reservoir levels) and concave with respect to other state variables (e.g., the spot price in the current stage).

To overcome this problem, the approach in the literature has been to discretize the price process in order to model it using a Markovian policy graph like those discussed in Markovian policy graphs.

However, recent work offers a way to include stagewise-dependent objective uncertainty into the objective function of SDDP subproblems. Readers are directed to the following works for an introduction:

  • Downward, A., Dowson, O., and Baucke, R. (2017). Stochastic dual dynamic programming with stagewise dependent objective uncertainty. Optimization Online. link

  • Dowson, O. PhD Thesis. University of Auckland, 2018. link

The method discussed in the above works introduces the concept of an objective state into SDDP. Unlike normal state variables in SDDP (e.g., the volume of water in the reservoir), the cost-to-go function is concave with respect to the objective states. Thus, the method builds an outer approximation of the cost-to-go function in the normal state-space, and an inner approximation of the cost-to-go function in the objective state-space.

Warning

Support for objective states in SDDP.jl is experimental. Models are considerably more computational intensive, the interface is less user-friendly, and there are subtle gotchas to be aware of. Only use this if you have read and understood the theory behind the method.

One-dimensional objective states

Let's assume that the fuel cost is not fixed, but instead evolves according to a multiplicative auto-regressive process: fuel_cost[t] = ω * fuel_cost[t-1], where ω is drawn from the sample space [0.75, 0.9, 1.1, 1.25] with equal probability.

An objective state can be added to a subproblem using the SDDP.add_objective_state function. This can only be called once per subproblem. If you want to add a multi-dimensional objective state, read Multi-dimensional objective states. SDDP.add_objective_state takes a number of keyword arguments. The two required ones are

  • initial_value: the value of the objective state at the root node of the policy graph (i.e., identical to the initial_value when defining normal state variables.

  • lipschitz: the Lipschitz constant of the cost-to-go function with respect to the objective state. In other words, this value is the maximum change in the cost-to-go function at any point in the state space, given a one-unit change in the objective state.

There are also two optional keyword arguments: lower_bound and upper_bound, which give SDDP.jl hints (importantly, not constraints) about the domain of the objective state. Setting these bounds appropriately can improve the speed of convergence.

Finally, SDDP.add_objective_state requires an update function. This function takes two arguments. The first is the incoming value of the objective state, and the second is the realization of the stagewise-independent noise term (set using SDDP.parameterize). The function should return the value of the objective state to be used in the current subproblem.

This connection with the stagewise-independent noise term means that SDDP.parameterize must be called in a subproblem that defines an objective state. Inside SDDP.parameterize, the value of the objective state to be used in the current subproblem (i.e., after the update function), can be queried using SDDP.objective_state.

Here is the full model with the objective state.

using SDDP, HiGHS

model = SDDP.LinearPolicyGraph(
    stages = 3,
    sense = :Min,
    lower_bound = 0.0,
    optimizer = HiGHS.Optimizer,
) do subproblem, t
    @variable(subproblem, 0 <= volume <= 200, SDDP.State, initial_value = 200)
    @variables(subproblem, begin
        thermal_generation >= 0
        hydro_generation >= 0
        hydro_spill >= 0
        inflow
    end)
    @constraints(
        subproblem,
        begin
            volume.out == volume.in + inflow - hydro_generation - hydro_spill
            demand_constraint, thermal_generation + hydro_generation == 150.0
        end
    )

    # Add an objective state. ω will be the same value that is called in
    # `SDDP.parameterize`.

    SDDP.add_objective_state(
        subproblem,
        initial_value = 50.0,
        lipschitz = 10_000.0,
        lower_bound = 50.0,
        upper_bound = 150.0,
    ) do fuel_cost, ω
        return ω.fuel * fuel_cost
    end

    # Create the cartesian product of a multi-dimensional random variable.

    Ω = [
        (fuel = f, inflow = w) for f in [0.75, 0.9, 1.1, 1.25] for
        w in [0.0, 50.0, 100.0]
    ]

    SDDP.parameterize(subproblem, Ω) do ω
        # Query the current fuel cost.
        fuel_cost = SDDP.objective_state(subproblem)
        @stageobjective(subproblem, fuel_cost * thermal_generation)
        return JuMP.fix(inflow, ω.inflow)
    end
end
A policy graph with 3 nodes.
 Node indices: 1, 2, 3

After creating our model, we can train and simulate as usual.

SDDP.train(model; run_numerical_stability_report = false)

simulations = SDDP.simulate(model, 1)

print("Finished training and simulating.")
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-23
-------------------------------------------------------------------
problem
  nodes           : 3
  state variables : 1
  scenarios       : 1.72800e+03
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [8, 8]
  AffExpr in MOI.EqualTo{Float64}         : [2, 4]
  AffExpr in MOI.GreaterThan{Float64}     : [2, 2]
  VariableRef in MOI.GreaterThan{Float64} : [6, 6]
  VariableRef in MOI.LessThan{Float64}    : [2, 3]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   6.581250e+03  2.721774e+03  2.216601e-02        39   1
        72   4.640625e+03  5.092593e+03  3.912880e-01      3708   1
-------------------------------------------------------------------
status         : simulation_stopping
total time (s) : 3.912880e-01
total solves   : 3708
best bound     :  5.092593e+03
simulation ci  :  5.195002e+03 ± 8.122200e+02
numeric issues : 0
-------------------------------------------------------------------

Finished training and simulating.

To demonstrate how the objective states are updated, consider the sequence of noise observations:

[stage[:noise_term] for stage in simulations[1]]
3-element Vector{@NamedTuple{fuel::Float64, inflow::Float64}}:
 (fuel = 1.1, inflow = 0.0)
 (fuel = 0.75, inflow = 0.0)
 (fuel = 1.25, inflow = 100.0)

This, the fuel cost in the first stage should be 0.75 * 50 = 37.5. The fuel cost in the second stage should be 1.1 * 37.5 = 41.25. The fuel cost in the third stage should be 0.75 * 41.25 = 30.9375.

To confirm this, the values of the objective state in a simulation can be queried using the :objective_state key.

[stage[:objective_state] for stage in simulations[1]]
3-element Vector{Float64}:
 55.00000000000001
 41.25000000000001
 51.56250000000001

Multi-dimensional objective states

You can construct multi-dimensional price processes using NTuples. Just replace every scalar value associated with the objective state by a tuple. For example, initial_value = 1.0 becomes initial_value = (1.0, 2.0).

Here is an example:

model = SDDP.LinearPolicyGraph(
    stages = 3,
    sense = :Min,
    lower_bound = 0.0,
    optimizer = HiGHS.Optimizer,
) do subproblem, t
    @variable(subproblem, 0 <= volume <= 200, SDDP.State, initial_value = 200)
    @variables(subproblem, begin
        thermal_generation >= 0
        hydro_generation >= 0
        hydro_spill >= 0
        inflow
    end)
    @constraints(
        subproblem,
        begin
            volume.out == volume.in + inflow - hydro_generation - hydro_spill
            demand_constraint, thermal_generation + hydro_generation == 150.0
        end
    )

    SDDP.add_objective_state(
        subproblem,
        initial_value = (50.0, 50.0),
        lipschitz = (10_000.0, 10_000.0),
        lower_bound = (50.0, 50.0),
        upper_bound = (150.0, 150.0),
    ) do fuel_cost, ω
        # fuel_cost is a tuple, containing the (fuel_cost[t-1], fuel_cost[t-2])
        # This function returns a new tuple containing
        # (fuel_cost[t], fuel_cost[t-1]). Thus, we need to compute the new
        # cost:
        new_cost = fuel_cost[1] + 0.5 * (fuel_cost[1] - fuel_cost[2]) + ω.fuel
        # And then return the appropriate tuple:
        return (new_cost, fuel_cost[1])
    end

    Ω = [
        (fuel = f, inflow = w) for f in [-10.0, -5.0, 5.0, 10.0] for
        w in [0.0, 50.0, 100.0]
    ]

    SDDP.parameterize(subproblem, Ω) do ω
        fuel_cost, _ = SDDP.objective_state(subproblem)
        @stageobjective(subproblem, fuel_cost * thermal_generation)
        return JuMP.fix(inflow, ω.inflow)
    end
end

SDDP.train(model; run_numerical_stability_report = false)

simulations = SDDP.simulate(model, 1)

print("Finished training and simulating.")
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-23
-------------------------------------------------------------------
problem
  nodes           : 3
  state variables : 1
  scenarios       : 1.72800e+03
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [9, 9]
  AffExpr in MOI.EqualTo{Float64}         : [2, 10]
  AffExpr in MOI.GreaterThan{Float64}     : [4, 4]
  VariableRef in MOI.GreaterThan{Float64} : [7, 7]
  VariableRef in MOI.LessThan{Float64}    : [3, 4]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   7.625000e+03  2.663219e+03  2.610397e-02        39   1
       199   9.500000e+03  5.135984e+03  1.026554e+00      9561   1
       391   4.687500e+03  5.135984e+03  1.906887e+00     17349   1
-------------------------------------------------------------------
status         : simulation_stopping
total time (s) : 1.906887e+00
total solves   : 17349
best bound     :  5.135984e+03
simulation ci  :  4.876616e+03 ± 3.802496e+02
numeric issues : 0
-------------------------------------------------------------------

Finished training and simulating.

This time, since our objective state is two-dimensional, the objective states are tuples with two elements:

[stage[:objective_state] for stage in simulations[1]]
3-element Vector{Tuple{Float64, Float64}}:
 (55.0, 50.0)
 (67.5, 55.0)
 (78.75, 67.5)

Warnings

There are number of things to be aware of when using objective states.

  • The key assumption is that price is independent of the states and actions in the model.

    That means that the price cannot appear in any @constraints. Nor can you use any @variables in the update function.

  • Choosing an appropriate Lipschitz constant is difficult.

    The points discussed in Choosing an initial bound are relevant. The Lipschitz constant should not be chosen as large as possible (since this will help with convergence and the numerical issues discussed above), but if chosen to small, it may cut of the feasible region and lead to a sub-optimal solution.

  • You need to ensure that the cost-to-go function is concave with respect to the objective state before the update.

    If the update function is linear, this is always the case. In some situations, the update function can be nonlinear (e.g., multiplicative as we have above). In general, placing constraints on the price (e.g., clamp(price, 0, 1)) will destroy concavity. Caveat emptor. It's up to you if this is a problem. If it isn't you'll get a good heuristic with no guarantee of global optimality.