Scenario Reduction and Tree Construction (scenred)

Introduction

Stochastic programs with recourse employing a discrete distribution of the random parameters become a deterministic programming problem. They can be solved by an appropriate optimization algorithm, ignoring the stochastic nature of (some or all) parameters.

The libInclude file scenred.gms, also known as ScenRed can be used for the reduction of scenarios and scenario tree construction modeling the random data processes.

The scenario reduction algorithms included determine a scenario subset (of prescribed cardinality or accuracy) and assign optimal probabilities to the preserved scenarios. The reduced problem is then solved by a deterministic optimization algorithm provided by GAMS.

Scenario tree construction allows to construct scenario trees as accurate input for multistage stochastic programs. The input are individual scenarios in form of a fan.

Note
The libInclude file scenred.gms is based on the executable tool SCENRED2 and utilizes its functionality.

Background

Scenario Reduction

Many solution methods for stochastic programs employ discrete approximations of the uncertain data processes by a set of scenarios (i.e., possible outcomes of the uncertain parameters) with corresponding probabilities.

For most practical problems the optimization problem that contains all possible scenarios (the so-called deterministic equivalent program) is too large. Due to computational complexity and to time limitations this program is often approximated by a model involving a (much) smaller number of scenarios.

The reduction algorithms developed in [1,2] determine a subset of the initial scenario set and assign new probabilities to the preserved scenarios. All deleted scenarios have probability zero.

ScenRed uses two reduction algorithms: The Forward and the Backward method. In general, the computational performance (accuracy, running time) of the methods differ. For huge scenario trees, the Backward method has the best expected performance with respect to running time. The Forward method is the best algorithm when comparing accuracy, but it can only be recommended if the number of preserved scenarios is small (strong reduction). If no reduction method is selected, the method with the best expected performance with respect to running time is chosen.

The reduction algorithms exploit a certain probability distance of the original and the reduced probability measure. The probability distance trades off scenario probabilities and distances of scenario values. Therefore, deletion will occur if scenarios are close or have small probabilities.

The reduction concept is general and universal. No requirements on the stochastic data processes (e.g. the dependency or correlation structure of the scenarios, the scenario probabilities or the dimension of the process) or on the structure of the scenarios (e.g. tree-structured or not) are imposed. The reduction algorithms can be tailored to the stochastic model if the user provides additional information (How many decision stages are involved? Where do the random parameters enter the model – in objective and/or right hand sides and/or technology matrices?) The information is used to choose the probability distances.

References:

  1. J. Dupačová, N. Gröwe-Kuska, W. Römisch: Scenario Reduction in Stochastic Programming: An Approach Using Probability Metrics, Mathematical Programming, 95:493-511, 2003
  2. H. Heitsch, W. Römisch: Scenario Reduction Algorithms in Stochastic Programming, Computational Optimization and Applications, 24:187–206, 2003

Scenario Tree Construction

An important issue for solving multistage stochastic programs consists in the approximate representation of the (multivariate) stochastic input process in the form of a scenario tree.

A scenario tree represents how uncertainty unfolds over multiple stages, starting from a root node at the initial period and branching out to nodes in subsequent periods. Each node represents a state of the system at a specific time, and the branches represent possible transitions to future states. The final nodes, or leaves, correspond to the different scenarios considered in the model.

The construction of a scenario tree typically begins with a large set of individual scenarios, which are then systematically reduced to make the problem computationally manageable. This involves modifying the tree structure and grouping similar scenarios together. The goal is to retain a representative subset of scenarios that captures the essential features of the original set.

References:

  1. H. Heitsch and W. Römisch: Scenario tree modeling for multistage stochastic programs, Mathematical Programming, 118:371–406, 2009

Usage

Model and Data Preparation

The usage of ScenRed requires additional data preparation and reformulation of the GAMS program for the stochastic programming model:

  • Analyze the GAMS program of the stochastic programming model.
    Since the initial scenarios and a number of input parameters have to be specified, one must identify the corresponding components of the GAMS model and create or calculate them if they do not already exist.
  • Reformulate the GAMS program.
    Check if the model can handle varying scenario or node probabilities, and whether the equations are defined in terms of a (possibly reduced) tree. If the model doesn't already contain a scenario tree, one should be added. If it does, it is a simple task to rewrite the equation definitions (and possibly other statements too) in terms of a subset of the original nodes or tree.

A reduction of the initial scenarios makes sense only if we are able to generate that part of the model that corresponds to the preserved scenarios (i.e. the reduced subtree). This is done by declaring a subset of the nodes in the original tree. The parameters and equations are declared over the original node set, but are defined over only the subtree. This will be illustrated by an example later in the section.

Further, one should verify that the model can handle changing probabilities. Many practical models involve scenarios with equal probabilities. This property will not be maintained by the probabilities in the reduced subtree.

As an example, consider the srkandw model in the GAMS model library, and the kand model upon which it is based. To produce srkandw from kand, we first reformulate the original to allow for solution over a reduced tree. To do this, we introduce a subset of the node set: set sn(n) 'nodes in reduced tree'; For convenience and clarity, we introduce a second subset at the same time, the set of leaf nodes: set leaf(n) 'leaf nodes in original tree'; as well as some code to compute this set based on the existing time-node mapping. We also declare a new parameter, the probabilities for the reduced tree: parameter sprob(n) 'node probability in reduced tree'; Once these are declared, we can quickly edit the equation definitions so that they run only over the reduced subtree: we simply substitute the reduced probabilities sprob for the original prob, and the reduced node set sn for the original node set n. Note that the declaration of the equations does not change.

This example illustrates one other change that may be required: the stochastic data must be in parameters having the node set as their last index. This is not the case in the kand model, so we simply reversed the indices in the dem parameter to meet the requirement in srkandw. It is also possible to create a transposed copy of the original data and pass that to ScenRed if the original data cannot be changed conveniently.

Parameterization

Before using ScenRed, a couple of input and output parameters need to be defined. They are collected in the one-dimensional parameter ScenRedParms. ScenRedParms can be included by the statement:

$libInclude scenred 
Note
If the libInclude file scenred.gms is invoked without arguments, ScenRedParms is loaded. Note that the libInclude must be stated before the actual scenario reduction/tree construction is done.

After this statement, all necessary settings can be made by assigning values to the corresponding parameters. A parameter value of zero (default) disables a setting. A list of the available parameters can be found below.

Table 1: Supported parameters in ScenRedParms

Symbol Description
num_time_steps path length from root to leaf
num_leaves leaves/scenarios in the initial tree
num_nodes nodes in the initial tree
num_random random variables assigned to a scenario or node
red_num_leaves desired number of preserved scenarios or leaves
red_percentage desired relative distance (accuracy), number from 0.0 to 1.0
reduction_method desired reduction method, 1 - Forward, 2 - Backward, 0 - Default
construction_method desired tree construction method, 1 - Forward, 2 - Backward
num_stages number stages
sroption enable use of the option file
run_time_limit time limit in seconds
report_level report level: more messages by higher values
visual_init visualization initial tree
visual_red visualization reduced (constructed) tree
plot_scen visualization scenario processes
out_scen output of scenario raw data
out_tree output of scenario tree data

The values for the parameters num_time_steps, num_leaves, num_nodes, and num_random are automatically derived from the input data. If the user sets these fields anyway, the program ensures that the internally calculated numbers and the numbers specified by the user match.

At least one of the parameters red_percentage and red_num_leaves must be set. The value of red_percentage will be ignored if the parameter red_num_leaves is non-zero. Otherwise, the tree will not be reduced if red_percentage=0, while the reduction of the tree will be maximal (i.e. only one scenario will be kept) if red_percentage=1. A numeric value of 0.4 means that the reduced tree maintains 60% of the information contained in the original tree. The reduction algorithms are skipped if red_num_leaves=num_leaves or if red_num_leaves=0 and red_percentage=0. These values can be assigned if the user wishes to run the scenario tree diagnostic.

Example:

The following statements describe a possible example setup for proceeding the scenario tree construction with visualization of the scenario tree and output of the scenarios to a raw data file afterwards. Note that the actual choice of the ScenRed action (scenario tree construction or scenario reduction) is determined when the libInclude is done, see below.

* general parameters 
ScenRedParms('num_leaves') = 100;
ScenRedParms('num_nodes') = 200;
ScenRedParms('num_time_steps') = 5;
ScenRedParms('num_random') = 2;
ScenRedParms('report_level') = 2;
ScenRedParms('run_time_limit') = 30;

* execution commands
ScenRedParms('visual_red') = 1;
ScenRedParms('out_scen') = 1;

libInclude Statement

Once you have prepared all the inputs, you are ready to use ScenRed:

Note
Since the initial scenarios and a number of input parameters have to be specified, the corresponding components of the GAMS program have to be defined before the actual scenario reduction/tree construction is done. Reduced scenarios have to be defined before the equations of the (reduced) stochastic programming model are used in a solve statement. Therefore the $libInclude scenred [args] statement can be placed anywhere between the definitions of the GAMS parameters and the solve statement of the reduced stochastic programming model.
$libInclude scenred myprefix tree_con n tree p rtree rp rv1 rv2

Table 2: ScenRed arguments:

Argument Description
1 myprefix base name for files used with the libInclude
2 tree_con or scen_red select the libInclude action: scenario tree construction or scenario reduction
3 n the set of nodes in the tree
4 tree the set of ancestor relations describing the tree
5 p the parameter containing the node probabilities
6 rtree the set of ancestor relations of the reduced tree (output)
7 rp the parameter containing the node probabilities for the reduced tree (output)
8 rv1, rv2, ... parameters containing random values of the nodes

Argument 2 instructs ScenRed either to construct a tree (tree_con) or to reduce a tree (scen_red).

Argument 3 (here: n) is the set of nodes making up the scenario tree. Note that the cardinality of this set is part of ScenRedParms.

Argument 4 (here: tree) is the ancestor mapping between the nodes. This mapping determines the scenario tree. Note that the mapping can be either an ancestor mapping (i.e. child-parent) or a successor mapping (parent-child). By default, an ancestor mapping is expected. If the check for this fails, a successor mapping is looked for.

Argument 5 (here: p) is the parameter of probabilities for the nodes in the original tree. It is only required that probabilities for the scenarios (i.e. the leaf nodes) be provided, but the parameter can contain probabilities for the non-leaf nodes as well.

Argument 8 and following specify the parameter(s) that comprise the random values assigned to the initial scenarios, or to the nodes of the scenario tree. There can be more than one such parameter, included in any order. The only requirement is that the node set be the final index in each of these parameters.

The output arguments rtree and rp correspond to the symbols imported from the output file. The parameters ScenRedParms and ScenRedReport are invisibly communicated.

Instead of providing an explicit name for all the different files used, the first argument determines the name of all files using a simple naming scheme:

Filename Command option Description
sr2myprefix.log log_file log file name
sr2myprefix_in.gdx input_gdx gdx input file name
sr2myprefix_out.gdx output_gdx gdx output file name
sr2myprefix.opt sr_option option file name
sr2myprefix_vi.plt visual_init file name for visualization of input tree
sr2myprefix_vr.plt visual_red file name for visualization of reduced/constructed tree
sr2myprefix_plot.plt plot_scen file name for visualization of scenario process
sr2myprefix_raw.dat out_scen file name for scenario data output in fan format
sr2myprefix_tree.dat out_tree file name for scenario data output in tree format

The first three files (log_file, input_gdx and output_gdx) are always used. The only optional input file sr_option is read by ScenRed if ScenRedParms('sroption')=1. When you create this file, make sure to use the proper file name. The output files are created if the corresponding option is set to 1 in ScenRedParms, e.g. ScenRedParms('out_tree')=1.

Option File

The option file is optional and can be used to define more specific options depending on what kind of method (scenario reduction or scenario construction) is used. A list of options together with examples are given below for both the scenario reduction and the scenario construction devices (see Sections Scenario Reduction and Scenario Tree Construction, respectively). As mentioned above, the option file is read if ScenRedParms('sroption')=1. The name of the file needs to follow the naming scheme.

Scenario Reduction

The application of the reduction algorithms contained in ScenRed can be individualized by configuring certain options. The most important one is given by the option metric_type which allows to control the reduction process by different type of probability distances. Altogether three distances can be selected (see Table 3 below). All probability distances are associated with a special order specification which can be set by the option order.

Table 3: Options for Scenario Reduction

Option Description
metric_type 1 - Transport (default), 2 - Fortet-Mourier, 3 - Wasserstein
p_norm choice of norm (example: 0 - max, 1 - sum, 2 - Euclidian)
scaling 0 - scaling off, 1 - scaling on (default)
order metric order (integer, default is 1)

Example:

For example, a valid scenario reduction setup is the following:

ScenRedParms statements:

ScenRedParms('reduction_method') = 1;
ScenRedParms('red_percentage') = 0.3;
ScenRedParms('sroption') = 1;

in combination with an option file:

* ScenRed option file

metric_type       2
order             2
p_norm            1
scaling           0

Lines starting with an asterisk are comments.

Scenario Tree Construction

A couple of options are offered to control the tree construction process. Note that in some cases due to sensibility of certain parameters some tuning is indispensable for producing good results.

Table 4 Options (basic) for Scenario Tree Construction

Option Description
first_branch time period of first branch (integer)
eps_growth 1 - linear, 2 - exponential
eps_evolution tree structure parameter (from 0.0 to 1.0)
scaling 0 - scaling off, 1 - scaling on (default)
order order of metric

Table 4 above displays the main options to control the tree construction process. They are very similar to the reduction options.

The role of the parameter red_percentage in ScenRedParms is to prescribe a total epsilon accuracy (level) for the approximation scheme. However, the approximation scheme is based on stagewise approximations which requires a splitting of the total level to the stages. Two strategies are offered by ScenRed, a linear and an exponential mapping of the total level to the intermediate levels. Use option eps_growth to select one of them. Both strategies allow a second tuning parameter eps_evolution which effects the slope of the epsilon splitting.

Even though this kind of control may generate good results for many applications, sometimes a more individual control can be needed. For example, some applications require a localization of branching stages. Moreover, to setup approximation bounds directly to stages can be very useful. To this end the standard options are extended by a new section environment.

Additional options - The section environment

An alternative control for the accurate constructions is provided by using the section environment. The section environment aims to establish a better monitoring of the construction process. There are overall three section types supported with the same syntax.

Branching control:

This section environment allows to specify branching points, i.e., an explicit selection of stages serving for branching. For example, use

section branching
  2
  4
  6
end

to allow branching only at time period 2, 4, and 6. Note that each stage statement must be placed in one line. But stages can be merged. A shorter formulation of the same contents can be written in closed form

section branching
  2*6  2
end

This statement reads branching within time periods from period 2 to period 6 with increment 2 steps. Both assignments can be combined and should be used together with the red_percentage parameter in ScenRedParms.

Epsilon control:

In the epsilon section it is possible to assign epsilon tolerances for the stage approximations explicitly. This environment overcomes difficulties at times coming across with the automatic epsilon control. Note that this environment disables the parameter red_percentage in ScenRedParms. For example, use

section epsilon
  2    0.04
  3*4  0.03
  5    0.02
  6    0.01
end

to control the approximation scheme by assigning different epsilon values per stage. Note that the value 0.03 is assigned to time period 3 and 4 in the example.

Node control:

The node control is the most specific control you have over the tree construction. With this environment the number of nodes of the tree which will be generated can be assigned for each time stage explicitly. For example, use

section num_nodes
  1    1
  2*3  5
  4*5  10
  6    15
end

The syntax is the same as before. Note that only one section environment can be use at once. In particular, only the first section environment detected in the option file is used. The section environment can be out commented like a standard option too.

Experimental option:

There is one other useful option to speed up computations when building different scenario trees from exactly the same input data. In this case the scenario distances needed to compute the trees could be saved to a external file at the first run and reloaded at later runs. Hence, the distances must be calculated only once. For example, use the option

write_distance  dist.sr2

to save the computed distances to the file 'dist.sr2'. To reload them at next run use the option

read_distance  dist.sr2

The option is classified as experimental since no validation of the input file takes place. Before using this option, please ensure that the distances loaded with the read_distance option are the right ones.

Example:

Finally, look at the following example to see a valid combination of option file and ScenRedParms which can be passed to the scenario tree construction:

ScenRedParms('construction_method') = 2;
ScenRedParms('reduction_method') = 1;
ScenRedParms('sroption') = 1;
* tree construction option file

order                1
scaling              0

section epsilon
  2*4    0.1
  5      0.2
  6      0.1
end

Examples

Small example problems have been included to the GAMS Model Library. The implementation can be found in the GAMS programs srtree and srpchase. srtree converts a fan format scenario representation into a tree format representation and then reduces the tree size. srpchase is to solve a simple stochastic purchase problem involving three stages. Sample scenarios which are generated from a fixed distribution using a random value generator serve as input for the tree construction.

Output

The output of using ScenRed is a GDX (GAMS Data Exchange) file that contains the reduced scenario tree and the ScenRedReport parameter.

The first data element in the output file is the ScenRedReport parameter containing scalar outputs and statistics. The elements of this parameter are summarized in Table 5. The second data element is the parameter containing the probabilities of the nodes in the reduced scenario tree. These node probabilities are required to construct the reduced tree. The third and final data element is the ancestor map for the reduced scenario tree. This map can be read from the GDX file, or the reduced tree can be built from the original one by using the reduced probabilities. The content of the data output file is summarized in Table 6.

Table 5 ScenRedReport elements

Element Description
ScenRedWarnings number of warnings
ScenRedErrors number of errors
run_time running time in sec.
orig_nodes number of nodes in the initial scenario tree
orig_leaves number of leaves (scenarios) in the initial scenario tree
red_nodes number of nodes in the reduced scenario tree
red_leaves number of leaves(scenarios) in the reduced tree
red_percentage relative distance of initial and reduced scenario tree
red_absolute absolute distance between initial and reduced scenario tree
reduction_method reduction method used:
0: the program stopped before it could select a method
1: Forward method
2: Backward method
construction_method construction method used (tree construction)

Table 6 Content of the Output File

No. Symbol Type Dimension Content
1 ScenRedReport Parameter 1 ScenRed report
2 red_prob Parameter 1 node probabilities for the reduced scenarios
3 red_ancestor Set 2 the ancestor map for the reduced scenarios

To read the output file, the GAMS execute_load statement is used. This statement is used to transfer GDX data to GAMS at execution time. As an example, to read a GDX file named srrun_out.gdx created, you might have the following statement:

   execute_load 'srrun_out.gdx', ScenRedReport, sprob=red_prob, sanc=red_ancestor;

In the statement above, the equal sign = is used to indicate that the data in the GDX parameter red_prob should be read into the GAMS parameter sprob, and the data in the GDX set red_ancestor should be read into the GAMS set sanc.

Assuming your model is formulated to use a subtree of the original, you can now continue with the solve and any subsequent reporting.

Visualization

In this section an easy way for making plots of scenario processes and tree structures is described. To this end you need the free Gnuplot software or any other plotting software which allows plotting directly from simple data files.

The concept of plotting tasks is the following. For each plot two files are generated, a Gnuplot access file (name.plt) and a raw data file (name.dat). The access file contains basic Gnuplot options and it can be adjusted for individual liking afterwards. The default output is the display. The supported plotting commands are

visual_init, visual_red, plot_scen

for plotting the initial tree structure, the reduced/constructed tree structure, and the scenario process(es), respectively.

Example:

For example, to visualize the constructed tree activate the ScenRedParms parameter

ScenRedParms('visual_red') = 1;

in the GAMS program. Depending on the name of the first argument in the ScenRed libInclude statement (myprefix in the example above), the results are the output files 'myprefix.plt' and 'myprefix.dat'. To compute the picture now you simply open the file 'myprefix.plt' with Gnuplot from the directory, where both output files are located (that should be the working directory). Alternatively, from the command line prompt call

>gnuplot myprefix.plt

Gnuplot will automatically generate the picture. Feel free to change any option in the Gnuplot access file for individual requirements. See also the Gnuplot manual for more details. In particular, to compute a well-scaled encapsulated postscript picture (eps), you simply have to uncomment a few lines in the Gnuplot option file above and to open it with Gnuplot once again.

With the parameter plot_scen the scenario process(es) can be visualized. Note that Gnuplot access and data files according to the number of random values are generated.

Data Export

After computing the scenario reduction or scenario tree construction, the scenario data can be exported to external data files. For this, two output parameters out_tree and out_scen can be used. These options generate data files according to the tree and fan format, respectively. The name of the data files follow the naming scheme.

Diagnostic Check of Scenario Trees

When the input data is read, a number of checks are performed to verify that the data is correct. These include:

  • Consistency of the desired input parameters with the content of the user input (number of nodes, number of leaves, number of time steps, number of random values assigned to a node).
  • Range check of desired input parameters and options.
  • Check of scenario and node probabilities.
  • Check of the ancestor matrix (check the orientation of the graph, check if the graph contains a cycle, check if the graph contains incomplete forests or scenarios, check the consistency of the parameter num_time_steps with the ancestor matrix).

The following errors in the specification of the scenario tree cause the program to skip the reduction algorithms:

  • The input files cannot be opened.
  • Not all required input parameters are given.
  • The required input parameters are not consistent with the user input.
  • The required input parameters are out of range.
  • Missing or negative scenario probabilities (probabilities of leaves).
  • The ancestor set contains too many entries (more than 2*num_nodes).
  • A cycle in the ancestor set.
  • Incomplete scenarios in the ancestor set.
  • Run time limit reached.

Errors and Error Numbers

If a serious error occurs, an error message is displayed in the log file. These messages always begin with

**** SCENRED run-time error ...

The number of errors are contained in the parameter ScenRedReport of the output file (if it could be created). The occurrence of an error can also be detected from the last ScenRed-related line in the log:

**** SCENRED ErrCode=...

The numerical values of ErrCode and their meaning are given below.

ErrCode Meaning
1 (for internal use)
2 fatal error while reading from input file
3 fatal error while writing to output file
4 fatal error while reading from option file
5 log file cannot be opened
6 a memory allocation error occurred
7 there are missing input parameters
8 could not access the GAMS names for the nodes
9 (for internal use)
10 ancestor set not given or contains too many entries
11 node probabilities cannot be not read or are wrong
12 random values for the nodes cannot be read
13 input parameters are out of range
14 ancestor set contains a cycle
15 incomplete scenarios or forests detected
16 fatal error in reduction algorithm (not enough memory)
17 running time limit reached

Warnings

Warnings are caused by misspecification of the initial scenarios that can be possibly fixed. When such an error is encountered in the input files or in the scenario tree, a message is displayed in the log file. These messages always start with

**** SCENRED Warning ...

The following list gives an overview of the cases that produce warnings, and the action taken by ScenRed in these cases.

  • The user assigned an option value that is out of range.
    Action: Assign the default value.
  • Both parameters red_num_leaves and red_percentage are assigned nontrivial values.
    Action: The value of red_percentage will be ignored.
  • The scenario probabilities (probabilities of leaves) do not sum up to 1.
    Action: The scenario probabilities are rescaled. Assign new probabilities to the remaining (inner) nodes that are consistent with the scenario probabilities.
  • Missing probabilities of inner nodes.
    Action: Assign node probabilities that are consistent with the scenario probabilities.
  • The ancestor set contains more than one ancestor for a node.
    Action: It is assumed that a successor set is given instead of an ancestor set (i.e., the transpose of an ancestor matrix. This means that the graph corresponding to the ancestor set has the wrong orientation). ScenRed starts the tree diagnostic for the successor set. The reduced tree will be defined in terms of a successor set as well (if the successor set passes the tree diagnostic and if ScenRed locates no fatal error during the run).