Gather-Update-Solve-Scatter (GUSS)

Introduction

The purpose of this chapter is to detail an extension of the GAMS modeling system that allows collections of models (parameterized exogenously by a set of samples or indices) to be described, instantiated, and solved efficiently.

As a specific example, we consider the parametric optimization problem \(\mathcal{P}(s)\) defined by:

\begin{equation} \label{eq:P(s)} \min_{x \in X(s)} f(x; s) \text{ s.t. } g(x; s) \leq 0 \end{equation}

where \(s \in S = \{ 1,\ldots,K \}\). Note that each scenario \(s\) represents a different problem for which the optimization variable is \(x\). The form of the constraint set as given above is simply for concreteness; equality constraints and range and bound constraints are trivial extensions of the above framework. Clearly the problems \(\mathcal{P}(s)\) are interlinked. We intend to show how such problems can be easily specified within GAMS, and detail one type of algorithmic extension that can exploit the nature of the linkage. Other extensions of GAMS allow solves to be executed in parallel or by using grid computing resources. Note that in our description we will use the terms indexed, parameterized, and scenario somewhat interchangeably. An extended version of this chapter containing several examples is available as a paper at http://www.gams.com/modlib/adddocs/gusspaper.pdf.

Design Methodology

One of the most important functions of GAMS is to build a model instance from the collection of equations (i.e. an optimization model defined by the GAMS keyword MODEL) and corresponding data (consisting of the content of GAMS (sub)sets and parameters). Such a model instance is constructed or generated when the GAMS execution system executes a SOLVE statement. The generated model instance is passed to a solver which searches for a solution of this model instance and returns status information, statistics, and a (primal and dual) solution of the model instance. After the solver terminates, GAMS brings the solution back into the GAMS database, i.e. it updates the level (.L) and marginal (.M) fields of variable and equation symbols used in the model instance. Hence, the SOLVE statement can be interpreted as a complex operator against the GAMS database. The model instance generated by a SOLVE statement only lives during the execution of this one statement, and hence has no representation within the GAMS language. Moreover, its structure does fit the relational data model of GAMS. A model instance consists of vectors of bounds and right hand sides, a sparse matrix representation of the Jacobian, a representation of the non-linear expressions that allow the efficient calculation of gradient vectors and Hessian matrices, and so on.

This chapter is concerned with solving collections of models that have similar structure but modified data. As an example, consider a linear program of the form:

\[ \min c^T x \text{ s.t. } Ax \geq b,\; \ell \leq x \leq u. \]

The data in this problem is \((A,b,c,\ell,u)\). Omitting some details, the following code could be used within GAMS to solve a collection of such linear programs in which each member of the collection has a different \(A\) matrix and lower bound \(\ell\):

Set       i / ... /, j / ... /;
Parameter A(i,j), b(i);
Variable  x(j), z, ...;
Equation  e(i), ...;
e(i).. sum(j, A(i,j)*x(j)) =g= b(i);
...
model mymodel /all/;

Set s / s1*s10 /
Parameter
    A_s(s,i,j) 'Scenario data'
    xlo_s(s,j) 'Scenario lower bound for variable x'
    xl_s(s,j)  'Scenario solution for x.l'
    em_s(s,i)  'Scenario solution for e.m';
Loop(s,
  A(i,j) = A_s(s,i,j);
  x.lo(j)= xlo_s(s,j);
  solve mymodel min z using lp;
  xl_s(s,j) = x.l(j);
  em_s(s,i) = e.m(i);
);

Summarizing, we solve one particular model (mymodel) in a loop over s with an unchanged model rim (i.e. the same individual variables and equations) but with different model data and different bounds for the variables. The change in model data for a subsequent solve statement does not depend on the previous model solutions in the loop.

The purpose of this new Gather-Update-Solve-Scatter (GUSS) manager is to provide syntax at the GAMS modeling level that makes an instance of a problem that provides limited access to treat that instance as an object, and allows the modeler to update portions of it iteratively. Specifically, we provide syntax that gives a list of data changes to an instance, and allows these changes to be applied sequentially to the instance (which is then solved without returning to GAMS). Thus, we can simulate a limited set of actions to be applied to the model instance object and retrieve portions of the solution of these changed instances back in the modeling environment. Such changes can be done to any model type in GAMS, including nonlinear problems and mixed integer models. However, the only changes we allow are to named parameters appearing in the equations and lower and upper bounds used in the model definition.

Thus, in the above example GUSS allows us to replace lines 15-21 by

Set dict / s.   scenario. ''
           A.   param.    A_s
           x.   lower.    xlo_s
           x.   level.    xl_s
           e.   marginal. em_s   /;
solve mymodel min z using lp scenario dict;

The three dimensional set dict (you can freely choose the name of this symbol) contains mapping information between symbols in the model (in the first position) and symbols that supply required update data or store solution information (in the third position), and the type of update/storing (in the second position). An exception to this rule is the tuple with label scenario in the second position. This tuple determines the symbol (in the first position) that is used as the scenario index. This scenario symbol can be a multidimensional set. A tuple in this set represents a single scenario. The remaining tuples in the set dict can be grouped into input and output tuples. Input tuples determine the modifications of the model instance prior to solving, while output tuples determine which part of the solution gets saved away. The following keywords can be used in the second position of the set dict:

Type Keywords Description
Input: param Supplies scenario data for a parameter used in the model
lower Supplies scenario lower bounds for a variable
upper Supplies scenario upper bounds for a variable
fixed Supplies scenario fixed bounds for a variable
Output: level Stores the levels of a scenario solution of variable or equation
marginal Stores the marginals of a scenario solution of variable or equation

Sets in the model cannot be updated. GUSS works as follows: GAMS generates the model instance for the original data. As with regular SOLVE statements, all the model data (e.g. parameter A) needs to be defined at this time. The model instance with the original data is also called the base case. The solution of the base case is reported back to GAMS in the regular way and is accessible via the regular .L and .M fields after the SOLVE statement. After solving the base case, the update data for the first scenario is applied to the model. The tuples with lower, upper, fixed update the bounds of the variables, whereas the tuples with param update the parameters in the model.

The scenario index s needs to be the first index in the parameters mapped in the set dict. The update of the model parameters goes far beyond updating the coefficients of the constraint matrix/objective function or the right hand side of an equation, as one can do with some other systems. GAMS stores all the necessary expressions of the constraints with the model instance, so the change in the constraint matrix coefficient is the result of an expression evaluation. For example, consider a term in the calculation of the cost for shipping a variable amount of goods x(i,j) between cities i and j. The expression for shipping cost is d(i,j)*f*x(i,j), i.e. the distance between the cities times a freight rate f times the variable amount of goods. In order to find out the sensitivity of the solution with respect to the freight rate f, one can solve the same model with different values for f. In a matrix representation of the model one would need to calculate the coefficient of x(i,j) which is d(i,j)*f, but with GUSS it is sufficient to supply different values for f that potentially result in many modified coefficients on the matrix level. GUSS evaluates the shipping cost term and communicates the resulting matrix coefficient to the solver reliably behind the scenes.

After the variable bound and the model parameter updates have been applied and the resulting updates to the model instance data structures (e.g. constraint matrix) has been determined, the modified model instance is passed to the solver. Some solvers (e.g. Cplex, Gurobi, SoPlex, and Xpress) allow modifying a model instance. In these cases GUSS only communicates the changes from the previous model instance to the solver. This reduces the amount of data communicated to the solver and also, in the case of an LP model, allows the solver to restart from an advanced basis and its factorization. In the case of an NLP model, this provides initial values. After the solver determines the solution of a model instance, GUSS stores the part of the solution requested by the output tuples of dict to some GAMS parameters and continues with the next scenario. GUSS emphasizes on speed and only works with solver that allow to communicate the model instance through memory, Hence, the following solvers cannot be used as subsolvers of GUSS: ALPHAECP, BARON, CONVERT, DECISC, DECISM, DICOPT, EXAMINER, GAMSCHK, JAMS, KESTREL, LOGMIP, MILES, MPSGE, NLPEC, PATHNLP, SBB.

GUSS Options

The execution of GUSS can be parameterized using some options. Options are not passed through a solver option file but via another tuple in the dict set. The keyword in the second position of this tuple is opt. A one dimensional parameter is expected in the first position (or the label ′′). This parameter may contain some of the following labels with values:

Options Description
OptfileInit: Option file number for the first solve (default from GAMS OptFile setting)
Optfile: Option file number for subsequent solves (default 0)
LogOption: Determines amount of log output:
0 - Moderate log (default)
1 - Minimal log
2 - Detailed log
NoHotStart: Disable hot start capability in solver that supports hot starts (default 0)
NoMatchLimit: Limit of unmatched scenario records (default 0)
RestartType:: Determines restart point for the scenarios
0 - Restart from last solution (default)
1 - Restart from solution of base case
2 - Restart from input point
SkipBaseCase: Switch for solving the base case (0 solves the base case)
ReportLastScen: Switch for reporting the solution of the last scenario rather than solution of the base case (default 0)
SolveEmpty: Limit of solved empty scenarios, afterwards empty scenarios will be skipped (default 0)
UpdateType: Scenario update mechanism:
0 - Set everything to zero and apply changes (default)
1 - Reestablish base case and apply changes
2 - Build on top of last scenario and apply changes

For the example model above the UpdateType setting would mean:

UpdateType=0:  loop(s, A(i,j) = A_s(s,i,j))
UpdateType=1:  loop(s, A(i,j) = A_base(i,j);
                       A(i,j) $= A_s(s,i,j))
UpdateType=2:  loop(s, A(i,j) $= A_s(s,i,j))

The option SkipBaseCase=1 allows the user to skip the base case. This means only the scenarios are solved and there is no solution reported back to GAMS in the traditional way. The third position in the opt-tuple can contain a parameter for storing the scenario solution attribute information, e.g. model and solve status, or needs to have the label ′′. The labels to store solution status information must be known to GAMS, so one needs to declare a set with such labels. A convenient way to enter these attributes is via System.GUSSModelAttributes:

Set ma 'GUSS Model Attributes' / System.GUSSModelAttributes /; display ma;
----      1 SET ma  GUSS Model Attributes

ModelStat,    SolveStat,    NumInfes ,    SumInfes ,    IterUsd
ResUsd   ,    ObjVal   ,    NodUsd   ,    ObjEst   ,    DomUsd
RObj     ,    MaxInfes ,    MeanInfes

The following example shows how to use some of the GUSS options and the use of a parameter to store some solution status information:

Set h solution headers / System.GUSSModelAttributes /;
Parameter
   o / SkipBaseCase 1, UpdateType 1, Optfile 1 /
   r_s(s,h) Solution status report;
Set dict / s.   scenario. ''
           o.   opt.      r_s
           a.   param.    a_s
           x.   lower.    xlo_s
           x.   level.    xl_s
           e.   marginal. em_s   /;
solve mymodel min z using lp scenario dict;

Please note that the domain set of the solution status report attributes (here h) must only contain model attributes known to GUSS. If this domain (unless the domain in *) contains a label unknown to GUSS, a compilation error is triggered.

Implementation Details

This section describes some technical details that may provide useful insight in case of unexpected behavior.

Because GUSS changes all model parameters mentioned in the dict set to variables, a linear model can produce some non-linear instructions (e.g. d(i,j)*f*x(i,j) becomes a non-linear expression since f becomes a variable in the model instance given to GUSS). This also explains why some models compile without complaint, but if the model is used in the context of GUSS, the compile time check of the model will fail because a parameter that is turned into a variable can no longer be used in that way. For example, suppose the model contains a constraint e(i).. sum(j$A(i,j), ...). If A(i,j) is a parameter in the regular model, the compiler will not complain, but if A becomes a parameter that shows up in the first position of a param tuple in the dict set, the GAMS compiler will turn A into a variable and complain that an endogenous variable cannot be used in a $-condition.

The sparsity pattern of a model can be greatly affected by GUSS. In a regular model instance GAMS will only generate and pass on non-zero matrix elements of a constraint e(i).. sum(j, A(i,j)*x(j)) ..., so the sparsity of A determines the sparsity of the generated model instance. GUSS allows to use this constraint with different values for A hence GUSS cannot exclude any of the pairs (i,j) and generate a dense matrix. The user can enforce some sparsity by explicitly restricting the (i,j) pairs: e(i).. sum(ij(i,j), A(i,j)*x(j)) ...

Attention
While GUSS is available for many model types quadratic models require special attention. Linear solvers that have been extended to cover (convex) quadratic models, e.g. Cplex, Gurobi, Mosek, Xpress, do not work properly if the modifying parameter affects the left hand side of linear equations or any modifications of quadratic equations. Bound updates as well as changes of the right hand side of linear constraints are okay (see the example of the quadratic support vector machine below). Unfortunately, detecting if a quadratic model is okay or not for a given solver is at the moment difficult to detect, so use caution with quadratic models in combination with such (linear) solvers.

The actual change of the GAMS language required for the implementation of GUSS is minimal. The only true change is the extension of the SOLVE statement with the term SCENARIO dict. Existing language elements have been used to store symbol mapping information, options, and model result statistics. Some parts of the GUSS presentation look somewhat unnatural, e.g. since dict is a three dimensional set the specification the scenario set using keyword scenario requires a third dummy label ′′. However, this approach gives maximum flexibility for future extension, allows reliable consistency checks at compile and execution time, and allows the user to delay the commitment for significant and permanent syntax changes of a developing method to handle model instances at a GAMS language level.

Applications

Cross Validation in GAMS via GUSS

Cross validation is a statistical/machine learning technique that aims to evaluate the generalizability of a classifier (or other decision) process. It does this by setting aside a portion of the data for testing, and uses the remaining data entries to produce the classifier. The testing data is subsequently used to evaluate how well the classifier works. Cross validation performs this whole process a number of times in order to estimate the true power of the classifier.

Ten-fold cross validation is a special case, where the original data is split into ten pieces, and cross validation is performed using each of these ten pieces as the testing set. Thus, the training process is performed ten times, each of which uses the data obtained by deleting the testing set from the whole dataset. We show below how to carry this out using the Gather-Update-Solve-Scatter (GUSS) facility in GAMS.

A paper with the title "GUSS: Solving Collections of Data Related Models within GAMS" that contains two additional application examples for GUSS is available here.

GUSS formulation in GAMS

The following example compares the two formulations for a feature-selection model under cross-validation using data files a_data.inc and b_data.inc . The actual source code for both of these GAMS formulations is available here.

Original GAMS formulation (without the GAMS/DEA interface):

$title Ten-fold cross validation example
$eolcom !

$setglobal num_folds 10

set a 'set for category 1'         /1*1505/
    b 'set for category 2'         /1*957/
    o 'observations'               /1*14/
    p 'folds to perform'           /1*%num_folds%/
    f 'maximum features to select' /1*10/

* Read in the data from the data files
parameter a_data(a, o) /
$offlisting
$include "a_data.inc"
$onlisting
/;

parameter b_data(b, o) /
$offlisting
$include "b_data.inc"
$onlisting
/;

set a_test(p,a), b_test(p,b) 'testing sets'
    a_trai(a),   b_trai(b)   'training sets';

* Define problem
scalar w_tol /1/
       features /6/;

positive variables a_err(a), sla(a)
                   b_err(b), slb(b);

variables c,
          weight(o),
          gamma;

binary variable y(o);


equations w_def1(o),
          w_def2(o),
          y_def,
          c_def,
          a_def(a),
          b_def(b);

w_def1(o)..
        weight(o) =l= w_tol*y(o);

w_def2(o)..
        weight(o) =g= -w_tol*y(o);

y_def..
        sum(o, y(o)) =e= features;

c_def..
        c =e= sum(a, a_err(a)) + sum(b, b_err(b));

a_def(a)..
        -sum(o, a_data(a, o)*weight(o)) + gamma + 1 =l= a_err(a) + sla(a);

b_def(b)..
         sum(o, b_data(b, o)*weight(o)) - gamma + 1 =l= b_err(b) + slb(b);


model train /all/;

$batinclude gentestset.inc "p,a" "p,b"

set headers 'report' / modelstat, solvestat, objval /;
parameter rep(p,headers);
train.optfile = 0;
option limrow=0, limcol=0, solprint=silent, mip=xpress,
       solvelink=%solveLink.loadLibrary%, optcr=0, optca=0;

$echo loadmipsol=1 > xpress.opt

loop(p,
  a_err.up(a) = inf; a_err.up(a)$a_test(p,a) = 0;
  b_err.up(b) = inf; b_err.fx(b)$b_test(p,b) = 0;
  sla.fx(a) = 0; sla.up(a)$a_test(p,a) = inf;
  slb.fx(b) = 0; slb.up(b)$b_test(p,b) = inf;
  solve train using mip minimizing c;
  train.optfile = 1; ! use mipstart for the second run
  rep(p,'modelstat') = train.modelstat;
  rep(p,'solvestat') = train.solvestat;
  rep(p,'objval') = train.objval;
);
display rep;

Options file for the original formulation: xpress.opt

 loadmipsol=1

The batinclude file gentestset.inc . gives instructions for generating the testing sets. It produces a_test and b_test that detail which equations are left out on solve p.

The actual model is set up to include all the data points in the equations a_def and b_def. To delete the equations that correspond to the test set, we introduce nonnegative slack variables into all the equations. We then set the upper bounds of the slack variables to zero in equations corresponding to the training set, and to infinity in equations corresponding to the testing set. At the same time we fix the error measures a_err and b_err belonging to the testing set by setting their upper bounds to zero. Thus the testing set equations are always satisfiable by choice of the slack variables alone - essentially they are discarded from the model as required. An alternative formulation could "include" the data equations that you need in each scenario, but the update from one scenario to the next in the defining data is much larger.

Cross validation formulated using GUSS: This model essentially mimics what the standard model does, but the implementation of the solver loop behind the scenes is much more efficient, and the consquences are that are clear to see if you execute both model runs. The changes are in the last 40 lines of the GAMS code.

$title Ten-fold cross validation example
$eolcom !

$setglobal num_folds 10

set a 'set for category 1'         /1*1505/
    b 'set for category 2'         /1*957/
    o 'observations'               /1*14/
    p 'folds to perform'           /1*%num_folds%/
    f 'maximum features to select' /1*10/

* Read in the data from the data files
parameter a_data(a, o) /
$offlisting
$include "a_data.inc"
$onlisting
/;

parameter b_data(b, o) /
$offlisting
$include "b_data.inc"
$onlisting
/;

set a_test(p,a), b_test(p,b) 'testing sets'
    a_trai(a),   b_trai(b)   'training sets';

* Define problem
scalar w_tol /1/
       features /6/;

positive variables a_err(a), sla(a)
                   b_err(b), slb(b);

variables c,
          weight(o),
          gamma;

binary variable y(o);


equations w_def1(o),
          w_def2(o),
          y_def,
          c_def,
          a_def(a),
          b_def(b);

w_def1(o)..
        weight(o) =l= w_tol*y(o);

w_def2(o)..
        weight(o) =g= -w_tol*y(o);

y_def..
        sum(o, y(o)) =e= features;

c_def..
        c =e= sum(a, a_err(a)) + sum(b, b_err(b));

a_def(a)..
        -sum(o, a_data(a, o)*weight(o)) + gamma + 1 =l= a_err(a) + sla(a);

b_def(b)..
         sum(o, b_data(b, o)*weight(o)) - gamma + 1 =l= b_err(b) + slb(b);


model train /all/;
train.optfile = 1;

$batinclude gentestset.inc "p,a" "p,b"

parameter wval(p,o), gval(p);

set headers 'report' / modelstat, solvestat, objval /;
parameter
    scenrep(p,headers)
    scopt(*) / SkipBaseCase 1, Optfile 1, LogOption 2 /;

set dict / p.     scenario.''
           scopt. opt.     scenrep
           a_err. upper.   aupper
           b_err. upper.   bupper
           sla.   upper.   afree
           slb.   upper.   bfree
           weight.level.   wval
           gamma. level.   gval /

$echo loadmipsol=1 > xpress.opt

Parameter aupper(p,a), bupper(p,b), afree(p,a), bfree(p,b);

aupper(p,a)$(not a_test(p,a)) = inf;
bupper(p,b)$(not b_test(p,b)) = inf;

afree(p,a)$a_test(p,a) = inf;
bfree(p,b)$b_test(p,b) = inf;

option mip=xpress, optcr=0, optca=0;
solve train using mip minimizing c scenario dict;
display scenrep, gval;

Firstly, parameters aupper, bupper, afree and bfree are used to set the bounds on the error and slack variables in the testing set equations respectively. The setting of the upper bounds are governed by the syntax shown in the controlling set dict. Furthermore, the output of the classifier (w,gamma) for each fold of the cross validation uses the dict set to place results into the parameters wval and gval respectively. Finally, the GUSS options are used to guarantee that the subsequent solves are instructed to process solver options (Optfile 1) which instruct the solver to use the previous solution to start the branch-and-cut process (loadmipsol=1).

The complete data and model files for this example are found in (galaxy zip archive). The data and model for a second instance based on the Wisconsin Diagnostic Breast Cancer Database is downloadable as (wdbc zip archive).

Quadratic Programs

GUSS is not limited to linear programs, but can be used more generally. Simple (indexed) quadratic models can be solved using GUSS. The following example illustrates the use of GUSS for quadratic programs. In this example, a support vector machine is used to determine a linear classifier that separates data into two categories. We use the following model:

\begin{equation} \begin{array}{ll} \text{Min}_{w,g,z} & (1/2)\|w\|2^2 + Ce^{T}z\\ \text{subject to} & D(Aw-g)+z \geq 1 \\ & z \geq 0 \\ \end{array} \end{equation}

Here, \(A\) is a matrix containing the training data (patients by features) and \(D\) is a diagonal matrix with values \(+1\) or \(-1\) (each denoting one of the two classes). \(C\) is a parameter weighting the importance of maximizing the margin between the classes \((2/\|w\|_2)\) versus minimizing the misclassification error \((z)\). The solution \(w\) and \(g\) are used to define a separating hyperplane \(\{x | w^T x = g\}\) to classify (unseen) data points.

As given, the standard linear support vector machine is not a slice model per se. It becomes a slice model under cross-validation training, where it is solved multiple times on different pieces of data. In this case, only the data \(A\) and \(D\) vary between solves, appropriately fitting the definition of a slice model.

The data for this example comes from the Wisconsin Diagnosis Breast Cancer Database, and is available here . The data was converted to the GAMS file wdbc.gms , which defines \(A\) and \(D\). The actual source code for the following GAMS formulation is available here.

The GUSS formulation for quadratic svm:

$title Ten-fold cross validation example using GUSS
$eolcom !

$setglobal num_folds 10

set p /1*%num_folds%/;  ! folds to perform

! Read in data
$include "wdbc.gms"

set test(p,i);          ! testing set

! Define problem
parameter C /1/;
positive variables z(i);
variables obj, w(k), gamma, slack(i);

equations obj_def, sep_def(i);

obj_def..  obj =e= 1/2*sum(k, sqr(w(k))) + C*sum(i, z(i));
sep_def(i)..
           D(i)*(sum(k, A(i,k)*w(k)) - gamma) + z(i) + slack(i) =g= 1;

model train /all/;

! Generate testing sets (to be deleted in each problem)
loop(p,
$batinclude gentestset2.inc "p,i"
);

set headers report / modelstat, solvestat, objval /;
parameter
    scenrep(p,headers)
    scopt / SkipBaseCase 1, LogOption 2 /;

set dict / p.    scenario.''
           scopt.opt.     scenrep
           z.    upper.   iupper
           slack.upper.   ifree /;

Parameter iupper(p,i), ifree(p,i);
iupper(p,i)$(not test(p,i)) = inf;
ifree(p,i)$test(p,i)        = inf;

option qcp=conopt, optcr=0, optca=0;
solve train using qcp minimizing obj scenario dict;
display scenrep;

Because the problem is quadratic, we must use a quadratic program solver. The variable values for weight and gamma could be saved for later testing using the same method as detailed above for the linear case.

The batinclude file gentestset2.inc is very similar to gentestset.inc from the earlier cross-validation examples. In gentestset2.inc, though, only one set is being dealt with rather than two. The complete source GAMS code for this formulation is available in this zip archive.

DEA Modeling in GAMS via GUSS

Data Envelopment Analysis (DEA) models can be solved most efficiently in GAMS using the Gather-Update-Solve-Scatter (GUSS) facility. This is the preferred method since release 23.7 of GAMS and hence the GAMS/DEA solver is no longer available.

A paper with the title "GUSS: Solving Collections of Data Related Models within GAMS" that contains two additional application examples for GUSS is available here.

Introduction

The basic (CCR) DEA model is a collection of models indexed by \(k\) and defined by

\begin{equation} \begin{array}{lll} \text{max}_{u,v} & u^TY_{\ast,k} & (\text{objective slice})\\ \text{subject to} & v^TX_{\ast,k} = 1 & (\text{slice constraint})\\ & u^TY \leq v^TX &(\text{core constraint})\\ & u,v \geq 0& (\text{core constraint})\\ \end{array} \end{equation}

where \(X,Y\) are data matrices.

Without using GUSS in GAMS, a model would be defined and solved in a loop over \(k\), requiring the model to be generated multiple times with different instances for each value of \(k\). GUSS is an alternative (and more efficient) way to define the individual programs and pass them to any underlying GAMS solver. In this way, individual programs are not re-generated, but are instead defined as data modifications of each other. This reduces overall model generation time. Further, previous solutions can be used as starting points in later solves to speed up overall processing time.

Some DEA examples compare the two formulations. The actual source code for both of these formulations is available here.

DEA Examples

Original GAMS formulation (without GUSS): In all these models the model setup and data are given at the top of the file and the code of interest is in the last 10 or so lines. In this setting we loop over the set k and change the data in the objective function and the first constraint of the model explicitly before each solve. We only output a minimal summary of the solution.

$title Data  Envelopment Analysis - DEA

$ontext
Data Envelopment Analysis (DEA) is a technique for measuring the relative
performance of organizational units where presence of multiple inputs and
outputs makes comparison difficult.

            efficiency = weighted sum of output / weighted sum of input

Find weights that maximize the efficiency for one unit while ensuring
that no other units has an efficiency < 1 using these weights. A primal
and dual formulation is presented.


Dyson, Thanassoulis, and Boussofiane, A DEA Tutorial.
Warwick Business School

$offtext

Sets  i     'units' / Depot1*Depot20 /
      j     'inputs and outputs' / stock, wages, issues, receipts, reqs /
      ji(j) 'inputs'             / stock, wages                         /
      jo(j)            'outputs' /               issues, receipts, reqs /;
alias(i,k);


Table data(i,j)
         stock  wages  issues  receipts  reqs
Depot1     3      5      40       55      30
Depot2     2.5    4.5    45       50      40
Depot3     4      6      55       45      30
Depot4     6      7      48       20      60
Depot5     2.3    3.5    28       50      25
Depot6     4      6.5    48       20      65
Depot7     7     10      80       65      57
Depot8     4.4    6.4    25       48      30
Depot9     3      5      45       64      42
Depot10    5      7      70       65      48
Depot11    5      7      45       65      40
Depot12    2      4      45       40      44
Depot13    5      7      65       25      35
Depot14    4      4      38       18      64
Depot15    2      3      20       50      15
Depot16    3      6      38       20      60
Depot17    7     11      68       64      54
Depot18    4      6      25       38      20
Depot19    3      4      45       67      32
Depot20    3      6      57       60      40
;

Parameter slice(j) 'slice of data'
          eff_k(i) 'efficiency report';

Positive variables v(ji) 'input weights'
                   u(jo) 'output weights';

Variable eff 'efficiency';


Equations defe    'efficiency definition - weighted output'
          denom   'weighted input'
          lime(i) 'output / input < 1';

defe..    eff =e= sum(jo, u(jo)*slice(jo));

denom..   sum(ji, v(ji)*slice(ji)) =e= 1;

lime(i).. sum(jo, u(jo)*data(i,jo)) =l= sum(ji, v(ji)*data(i,ji));

model dea /defe, denom, lime /;

set headers / modelstat, solvestat, objval /;
parameter rep(k,headers) 'solution report summary';
option limrow=0, limcol=0, solprint=silent,
       solvelink=%solveLink.loadLibrary%;
loop(k,
  slice(j) = data(k,j);
  solve dea using lp max eff;
  rep(k,'modelstat') = dea.modelstat;
  rep(k,'solvestat') = dea.solvestat;
  rep(k,'objval'   ) = dea.objval;
);
display rep;

The DEA problem formulated using GUSS: In this setting, the solve statement includes an extra keyword scenario that points to a new set called dict. The contents of this set are directives to GUSS that state the scenario index is \(k\), the parameter slice is populated from the parameter data and the values of the variable eff are stored into the parameter eff_k for each scenario solved. More details follow below.

$title Data  Envelopment Analysis - DEA

Sets  i     'units' / Depot1*Depot20 /
      j     'inputs and outputs '/ stock, wages, issues, receipts, reqs /
      ji(j) 'inputs'             / stock, wages                         /
      jo(j)            'outputs' /               issues, receipts, reqs /;
alias(i,k);


Table data(i,j)
         stock  wages  issues  receipts  reqs
Depot1     3      5      40       55      30
Depot2     2.5    4.5    45       50      40
Depot3     4      6      55       45      30
Depot4     6      7      48       20      60
Depot5     2.3    3.5    28       50      25
Depot6     4      6.5    48       20      65
Depot7     7     10      80       65      57
Depot8     4.4    6.4    25       48      30
Depot9     3      5      45       64      42
Depot10    5      7      70       65      48
Depot11    5      7      45       65      40
Depot12    2      4      45       40      44
Depot13    5      7      65       25      35
Depot14    4      4      38       18      64
Depot15    2      3      20       50      15
Depot16    3      6      38       20      60
Depot17    7     11      68       64      54
Depot18    4      6      25       38      20
Depot19    3      4      45       67      32
Depot20    3      6      57       60      40
;

Parameter slice(j) 'slice of data'
          eff_k(i) 'efficiency report';

Positive variables v(ji) 'input weights'
                   u(jo) 'output weights';

Variable eff 'efficiency';


Equations defe    'efficiency definition - weighted output'
          denom   'weighted input'
          lime(i) 'output / input < 1';

defe..    eff =e= sum(jo, u(jo)*slice(jo));

denom..   sum(ji, v(ji)*slice(ji)) =e= 1;

lime(i).. sum(jo, u(jo)*data(i,jo)) =l= sum(ji, v(ji)*data(i,ji));

model dea /defe, denom, lime /;

set headers 'report' / modelstat, solvestat, objval /;
parameter scenrep(k,headers) 'solution report summary'
          scopt / SkipBaseCase 1 /;

set dict / k      .scenario.''
           slice  .param.   data
           eff    .level.   eff_k
           scopt  .opt.     scenrep /;

slice(j) = 0; option lp=cplex;
solve dea using lp max eff scenario dict;
display scenrep,eff_k;

In the GUSS version we indicate the collection of models to be solved using the set dict. The first element of dict determines the set to be used for the scenario (collection) index, in this case \(k\). The second element of dict then details that in each scenario \(k\), the parameter slice is instantiated using a slice of the parameter data. Essentially, this corresponds to the GAMS statement:

slice(j) = data(k,j)

Note the scenario index \(k\) must appear as the first index of the parameter data. The third element of dict allows the modeler to collect information from each solve and store it into a GAMS parameter. Essentially, the third element of dict corresponds to the GAMS statement:

eff_k(k) = eff.l

that gets executed immediately after the solve of scenario \(k\).

More complex scenario models can also be formulated using GUSS, including multiple equations being updated. This is shown by the dual of the basic DEA model, given by

\begin{equation} \begin{array}{lcl} \text{min}_{z,\lambda} & z &(\text{objective})\\ \text{subject to} & X^\ast \lambda \leq zX _{\ast,k} &(\text{slice constraint})\\ & Y^\ast \lambda \geq Y_{\ast,k} &(\text{slice constraint})\\ & \lambda \geq 0 &(\text{core constraint})\\ \end{array} \end{equation}

The next example compares the two formulations for this model. The actual source code for both of these formulations is available here.

Original GAMS formulation (without GUSS):

$title Data Envelopment Analysis - DEA (traditional)

sets  i  'units' / Depot1*Depot20 /
      j     'inputs and outputs' / stock, wages, issues, receipts, reqs /
      ji(j) 'inputs'             / stock, wages                         /
      jo(j)            'outputs' /               issues, receipts, reqs /;
alias(k,i);

Table data(i,j)
         stock  wages  issues  receipts  reqs
Depot1     3      5      40       55      30
Depot2     2.5    4.5    45       50      40
Depot3     4      6      55       45      30
Depot4     6      7      48       20      60
Depot5     2.3    3.5    28       50      25
Depot6     4      6.5    48       20      65
Depot7     7     10      80       65      57
Depot8     4.4    6.4    25       48      30
Depot9     3      5      45       64      42
Depot10    5      7      70       65      48
Depot11    5      7      45       65      40
Depot12    2      4      45       40      44
Depot13    5      7      65       25      35
Depot14    4      4      38       18      64
Depot15    2      3      20       50      15
Depot16    3      6      38       20      60
Depot17    7     11      68       64      54
Depot18    4      6      25       38      20
Depot19    3      4      45       67      32
Depot20    3      6      57       60      40
;

parameter slice(j) 'slice of data'
          eff_k(i) 'efficiency report';

Variables z      'efficiency'
          lam(i) 'dual weights';

positive variables lam;

Equations dii(ji) 'input duals'
          dio(jo) 'output dual';

*  dual model

dii(ji).. sum(i, lam(i)*data(i,ji)) =l= z*slice(ji);

dio(jo).. sum(i, lam(i)*data(i,jo)) =g=     slice(jo);

model deadc dual with CRS / dii, dio /;

parameter rep 'summary report';
option limrow=0, limcol=0, solprint=silent, lp=cplex,
       solvelink=%solveLink.loadLibrary%;

loop(k,
   slice(j) = data(k,j);
   solve deadc using lp minimizing z ;
   rep(k,'modelstat') = deadc.modelstat;
   rep(k,'solvestat') = deadc.modelstat;
   rep(k,'objval') = deadc.objval;
);

display rep;

Dual (CRS) DEA model formulated using GUSS: The key modeling statements occur in the last 10 lines below.

$title Data Envelopment Analysis - DEA (dual,GUSS)

sets  i  'units' / Depot1*Depot20 /
      j     'inputs and outputs' / stock, wages, issues, receipts, reqs /
      ji(j) 'inputs'             / stock, wages                         /
      jo(j)            'outputs' /               issues, receipts, reqs /;
alias(k,i);

Table data(i,j)
         stock  wages  issues  receipts  reqs
Depot1     3      5      40       55      30
Depot2     2.5    4.5    45       50      40
Depot3     4      6      55       45      30
Depot4     6      7      48       20      60
Depot5     2.3    3.5    28       50      25
Depot6     4      6.5    48       20      65
Depot7     7     10      80       65      57
Depot8     4.4    6.4    25       48      30
Depot9     3      5      45       64      42
Depot10    5      7      70       65      48
Depot11    5      7      45       65      40
Depot12    2      4      45       40      44
Depot13    5      7      65       25      35
Depot14    4      4      38       18      64
Depot15    2      3      20       50      15
Depot16    3      6      38       20      60
Depot17    7     11      68       64      54
Depot18    4      6      25       38      20
Depot19    3      4      45       67      32
Depot20    3      6      57       60      40
;

parameter slice(j) 'slice of data'
          eff_k(i) 'efficiency report';

Variables z      'efficiency'
          lam(i) 'dual weights';

positive variables lam;

Equations dii(ji) 'input duals'
          dio(jo) 'output dual';

*  dual model

dii(ji).. sum(i, lam(i)*data(i,ji)) =l= z*slice(ji);

dio(jo).. sum(i, lam(i)*data(i,jo)) =g=     slice(jo);

model deadc dual with CRS / dii, dio /;

set headers 'report' / modelstat, solvestat, objval /;
parameter scenrep(k,headers) 'solution report summary'
          scopt / SkipBaseCase 1 /;

set dict / k.     scenario.''
           scopt. opt.     scenrep
           slice. param.   data
           z.     level.   eff_k   /;

slice(j) = 0; option lp=cplex;
solve deadc using lp min z scenario dict;
display scenrep,eff_k;

Extensions of these models to formulations with weighted outputs or variable returns to scale are easy to formulate with the scenario solver within GAMS. This extended model can be downloaded here.

The DEA model in the model library is similar to the extended model, but does not make use of GUSS.