GAMS Connect

# Concept

GAMS Connect is a framework inspired by the concept of a so-called ETL (extract, transform, load) procedure that allows to integrate data from various data sources. The GAMS Connect framework consists of the Connect database and the Connect agents that operate on the Connect database. Via the available Connect interfaces the user passes instructions to call Connect agents for reading data from various file types into the Connect database, transforming data in the Connect database, and writing data from the Connect database to various file types. Instructions are passed in YAML syntax. Note that in contrast to a typical ETL procedure, read, transform and write operations do not need to be strictly separated.

The GAMS Connect Framework

# Usage

GAMS Connect is available via the GAMS command line parameters ConnectIn and ConnectOut, via embedded code Connect, and as a standalone command line utility gamsconnect.

Instructions processed by the GAMS Connect interfaces need to be passed in YAML syntax as follows:

- <agent name1>:
<global option1>: <value>
<global option2>: <value>
.
.
symbols:
- <symbol1 option1>: <value>
<symbol1 option2>: <value>
.
.
- <symbol2 option1>: <value>
<symbol2 option2>: <value>
.
.
.
.
- <agent name2>:
.
.
.
.


The user lists the tasks to be performed successively. All tasks begin at the same indentation level starting with a - (a dash and a space) followed by the Connect agent name and a : (a colon). Connect agent options are represented in a simple <option>: <value> form. Please check the documentation of Connect Agents for available options. Some agents can operate on multiple symbols. For these agents there are two types of options:

1. Global: Global options allow to define general settings, e.g. the file name.
2. Symbol: Symbol options allow to define symbol specific settings, e.g. the name of the symbol. Symbol options are defined in a sub-list under the keyword symbols.

Some Connect agent options are of type global and symbol simultaneously, i.e. the user can define a global value for the option but may change the value for some symbols specifically. Note that YAML syntax also supports an abbreviated form for lists and dictionary, e.g. symbols: [ {<symbol1 option1>: <value>, <symbol1 option2>: <value>}, {<symbol2 option1>: <value>, <symbol2 option2>: <value>} ].

Here is an example that uses embedded Connect code to process instructions:

$onecho > distance.csv i;j;distance in miles seattle;new-york;2,5 seattle;chicago;1,7 seattle;topeka;1,8 san-diego;new-york;2,5 san-diego;chicago;1,8 san-diego;topeka;1,4$offecho

$onecho > capacity.csv i,capacity in cases seattle,350.0 san-diego,600.0$offecho

Set i 'Suppliers', j 'Markets';
Parameter d(i<,j<) 'Distance', a(i) 'Capacity';
$onEmbeddedCode Connect: - CSVReader: file: distance.csv name: distance indexColumns: [1, 2] valueColumns: [3] fieldSeparator: ';' decimalSeparator: ',' - CSVReader: file: capacity.csv name: capacity indexColumns: [1] valueColumns: [2] - GAMSWriter: symbols: - name: distance newName: d - name: capacity newName: a$offEmbeddedCode

display i, j, d, a;


In this example, we are reading two CSV files distance.csv and capacity.csv using the CSVReader. Then we directly write to symbols in GAMS using the GAMSWriter.

Note that even though GAMS is case insensitive, GAMS Connect is case sensitive, i.e., YAML instructions are treated case sensitive. This also includes, e.g., indices in CSV files. Consider the following example where the index j2 should be substituted by ABC when reading the CSV file y.csv:

$onecho > y.csv i1,j1,2.5 i1,J2,1.7 i2,j1,1.8 i2,j2,1.4$offecho

set i,j;
parameter p(i<,j<);

$onEmbeddedCode Connect: - CSVReader: file: y.csv name: p indexColumns: [1,2] valueColumns: [3] header: false indexSubstitutions: { j2: ABC } - GAMSWriter: writeAll: true$offEmbeddedCode

display i,j,p;


Since the YAML instructions are treated case sensitive, the index J2 will not be substituted.

            j1          J2         ABC

i1       2.500       1.700
i2       1.800                   1.400


All instructions provided to the Connect framework are read using UTF-8 encoding (utf-8-sig). This can be customized by adding a comment in the format # coding=<encoding name> or # -*- coding: <encoding name> -*- as first line in the YAML code. Note that UTF-16 encoding is not supported.

# Connect Agents Summary

Current Connect agents support the following data source formats: CSV, Excel, GDX and SQL. The following Connect agents are available:

Connect agent Description
CSVReader Allows reading a symbol from a specified CSV file into the Connect database.
CSVWriter Allows writing a symbol in the Connect database to a specified CSV file.
DomainWriter Allows rewriting the domain informaion of an existing Connect symbol.
GAMSReader Allows reading symbols from the GAMS database into the Connect database.
GAMSWriter Allows writing symbols in the Connect database to the GAMS database.
GDXReader Allows reading symbols from a specified GDX file into the Connect database.
GDXWriter Allows writing symbols in the Connect database to a specified GDX file.
Options Allows to set more general options that can affect the Connect database and other Connect agents.
PandasExcelReader Allows reading symbols from a specified Excel file into the Connect database.
PandasExcelWriter Allows writing symbols in the Connect database to a specified Excel file.
PandasSQLReader Allows reading symbols from a specified SQL database into the Connect database.
PandasSQLWriter Allows writing symbols in the Connect database to a specified SQL database.
Projection Allows index reordering and projection onto a reduced index space of a GAMS symbol.
PythonCode Allows executing arbitrary Python code.
RawExcelReader Allows reading unstructured data from a specified Excel file into the Connect database.

# Getting Started

We introduce the basic functionalities of GAMS Connect agents on some examples.

## Simple Connect Example with Spreadsheets

The following example is part of GAMS Model Library as model cta and shows how to read and write Excel spreadsheets. Here is a code snippet of the first lines:

Set
i 'rows'
j 'columns'
k 'planes';

Parameter
dat(k<,i<,j<) 'unprotected data table'
pro(k,i,j)    'information sensitive cells';

* extract data from Excel workbook
$onEmbeddedCode Connect: - PandasExcelReader: file: cox3.xlsx symbols: - name: dat range: Sheet1!A1 rowDimension: 2 columnDimension: 1 - name: pro range: Sheet2!A1 rowDimension: 2 columnDimension: 1 - GAMSWriter: writeAll: True$offEmbeddedCode

[...]


It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for the parameters are read from the Excel file cox3.xlsx using the Connect agent PandasExcelReader. The PandasExcelReader agent allows reading data for multiple symbols that are listed under the keyword symbols, here, parameter dat and pro. For each symbol, the symbol name is given by option name and the Excel range by option range. The option rowDimension defines that the first 2 columns of the data range will be used for the labels. In addition, the option columnDimension defines that the first row of the data range will be used for the labels. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes the parameters dat and pro available outside the embedded Connect code. Note that the sets i, j and k are defined implicitly through parameter dat.

Finally, after solving the cox3c model with alternative solutions, Connect can be used to export results to Excel:

[...]

loop(l$((obj.l - best)/best <= 0.01), ll(l) = yes; binrep(s,l) = round(b.l(s)); binrep('','','Obj',l) = obj.l; binrep('','','mSec',l) = cox3c.resUsd*1000; binrep('','','nodes',l) = cox3c.nodUsd; binrep('Comp','Cells','Adjusted',l) = sum((i,j,k)$(not s(i,j,k)), 1$round(adjn.l(i,j,k) + adjp.l(i,j,k))); solve cox3c min obj using mip; ); embeddedCode Connect: - GAMSReader: symbols: - name: binrep - PandasExcelWriter: file: results.xlsx symbols: - name: binrep range: binrep!A1 endEmbeddedCode  This time, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports the reporting parameter binrep into the Connect database. With the Connect agent PandasExcelWriter we write the parameter into the binrep sheet of the Excel file results.xlsx. ## Simple Connect Example with CSV files The following example (a modified version of the trnsport model) shows how to read and write CSV files. The full example is part of DataLib as model connect03. Here is a code snippet of the first lines: $onEcho > distance.csv
i,new-york,chicago,topeka
seattle,2.5,1.7,1.8
san-diego,2.5,1.8,1.4
$offEcho$onEcho > capacity.csv
i,capacity
seattle,350
san-diego,600
$offEcho$onEcho > demand.csv
j,demand
new-york,325
chicago,300
topeka,275
$offEcho Set i 'canning plants', j 'markets'; Parameter d(i<,j<) 'Distance', a(i) 'Capacity', b(j) 'Demand';$onEmbeddedCode Connect:
file: distance.csv
name: d
indexColumns: 1
valueColumns: "2:lastCol"
file: capacity.csv
name: a
indexColumns: 1
valueColumns: 2
file: demand.csv
name: b
indexColumns: 1
valueColumns: 2
- GAMSWriter:
writeAll: True
$offEmbeddedCode [...]  It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for the parameters are read from CSV files using the Connect agent CSVReader. The CSVReader agent, for example, reads the CSV file distance.csv and creates the parameter d in the Connect database. The name of the parameter must be given by the option name. Column number 1 is specified as the first domain set using option indexColumns. The valueColumns option is used to specify the column numbers 2, 3 and 4 containing the values. Per default, the first row of the columns specified via valueColumns will be used as the second domain set. The symbolic constant lastCol can be used if the number of index or value columns is unknown. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes the parameters d, a and b available outside the embedded Connect code. Note that the sets i and j are defined implicitly through parameter d. Finally, after solving the transport model, Connect can be used to export results to a CSV file: [...] Model transport / all /; solve transport using lp minimizing z; EmbeddedCode Connect: - GAMSReader: symbols: - name: x - Projection: name: x.l(i,j) newName: x_level(i,j) - CSVWriter: file: shipment_quantities.csv name: x_level unstack: True endEmbeddedCode  This time, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports variable x into the Connect database. With the Connect agent CSVWriter we write the variable level to the CSV file shipment_quantities.csv: i_0,new-york,chicago,topeka seattle,50.0,300.0,0.0 san-diego,275.0,0.0,275.0  Setting the option unstack to True allows to use the last dimension as the header row. ## Simple Connect Example for SQL The following example (a modified version of the whouse model) shows how to read from and write to a SQL database (sqlite). The full example is part of DataLib as model connect04. Here is a code snippet of the first lines: [...] Set t 'time in quarters'; Parameter price(t) 'selling price ($ per unit)'
istock(t) 'initial stock      (units)';

Scalar
storecost 'storage cost  ($per quarter per unit)' storecap 'stocking capacity of warehouse (units)';$onEmbeddedCode Connect:
connection: "sqlite:///whouse.db"
symbols:
- name: t
query: "SELECT * FROM timeTable;"
type: set
- name: price
query: "SELECT * FROM priceTable;"
- name: istock
query: "SELECT * FROM iniStockTable;"
- name: storecost
query: "SELECT * FROM storeCostTable;"
- name: storecap
query: "SELECT * FROM storeCapTable;"
- GAMSWriter:
writeAll: True
$offEmbeddedCode [...]  It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for all the symbols are read from the sqlite database whouse.db using the Connect agent PandasSQLReader by passing the connection url through the option connection. The PandasSQLReader agent, for example, queries the table priceTable for data and creates the parameter price in the Connect database. The PandasSQLReader allows reading data for multiple symbols that are listed under the keyword symbols and are fetched through the same connection. For each symbol the name must be given by the option name. The SQL query statement is passed through the option query. The symbol type can be specified using the option type. By default, every symbol is treated as a GAMS parameter. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes all read in symbols available outside the embedded Connect code. Further, after solving the warehouse model, Connect can be used to export the results to tables in the SQL database. [...] Model swp 'simple warehouse problem' / all /; solve swp minimizing cost using lp; EmbeddedCode Connect: - GAMSReader: readAll: True - Projection: name: stock.l(t) newName: stock_level(t) - Projection: name: sell.l(t) newName: sell_level(t) - Projection: name: buy.l(t) newName: buy_level(t) - PandasSQLWriter: connection: "sqlite:///whouse.db" symbols: - name: stock_level tableName: stock_level ifExists: replace - name: sell_level tableName: sell_level ifExists: replace - name: buy_level tableName: buy_level ifExists: replace endEmbeddedCode  Here, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports all the variables into the Connect database. The PandasSQLWriter agent then writes each symbol to respective tables in the SQL database whouse.db. For example the stock level: |t_0 |level | |:-------|:---------| |q-1 |100.0 | |q-2 |0.0 | |q-3 |0.0 | |q-4 |0.0 |  The ifExists option allows to either append to an extending table or replace it with new data. By default, the value for ifExists is set to fails. ## Complex Connect Example with Spreadsheets The following example (inspired by the model herves) reads a 3-dimensional parameter from a spreadsheet that has one row index (code) at the left side of the table and the other row index (labId) at the right of the table. A column index (cut) is at the top of the table. The column index consists of floating-point numbers. The goal it to read the data into GAMS but modify the labels of some sets: Only the first two decimal digits of the elements in cut are significant. Moreover, the labId should be prefixed with an L. A new spreadsheet with the new labels should be written. The layout of the table should remain with the exception of moving the labId column also to the left. Here is a screenshot of the original table: Original spreadsheet data of table raw The following GAMS code uses a separate GAMS program (getdata.gms) to get the raw data from the original spreadsheet. Connect runs inside a compile-time embedded code section and uses the Connect agent RawExcelReader to get the raw Excel data. In some subsequent GAMS code the sets rr and cut[Id] as well as the parameter raw are filled knowing the layout of the table (the code is written in a way that the table can grow). This GAMS program gets executed and instructed to create a GDX file. In a compile-time embedded Connect section the relevant symbols (rr, cutId, and raw) are read from this GDX file. The Projection agent extracts the domain labid from the set rr and some Python code using Connect agent PythonCode makes the label adjustments and sorts the data nicely. The Python code uses the connect.container methods to read from and write to the Connect database. Finally, the GAMSWriter agent sends the data to GAMS. In the main program at execution-time an embedded Connect code section exports the labdata parameter in the required form (after reading it from GAMS with the GAMSReader agent). Here is a screenshot of the resulting table: Table in newly created spreadsheet with new labels and layout In the remainder of the GAMS code another execution-time embedded Connect code is used to read the data back from the newly created spreadsheet using Connect agent PandasExcelReader. The set rr is created from parameter labdata using the Projection agent and everything is written back to GAMS with Connect agent GAMSWriter. The original data and the data from the newly created spreadsheet are exported to GDX (using execute_unload) and compared to verify that the data is identical by calling gdxdiff. The entire code is listed below. This model is part of DataLib as model connect01. Set code, labId, cut, rr(code<,labId); parameter labdata(code,labid,cut);$onEcho > getdata.gms
alias (u,*); Set s,w,r,c,ws(s,w),vs(s,r,c),vu(s,r,c,u); Parameter vf(s,r,c);
$onEmbeddedCode Connect: - RawExcelReader: file: labdata.xlsx - GAMSWriter: writeAll: True$offEmbeddedCode

* Symbols to be filled
alias (*,code,labId,cut); Parameter raw(code,labId,cut); Set cutId, rr(code,labId)
Set cX(c,cut) 'column index', rX(r,code,labId) 'row index';
Singleton set cLast(c); Scalar lastPos;
loop(ws(s,'ZAg'),
lastPos = smax(vu(s,r,c,u), c.pos); cLast(c) = c.pos=lastPos;
loop(r$(ord(r)>4), rX(r,code,labId)$= vu(s,r,'C1',code) and vu(s,r,cLast,labId));
loop(c$(ord(c)>1 and not cLast(c)), cX(c,cut)$= vu(s,'R4',c,cut));
loop((rX(r,code,labId),cX(c,cut)),
raw(code,labId,cut) = vf(s,r,c))
loop(cX(c,cut),
cutId(cut) = yes)
);
option rr<rX;
$offEcho$call.checkErrorLevel gams getdata.gms lo=%gams.lo% gdx=getdata.gdx
$onEmbeddedCode Connect: - GDXReader: file: getdata.gdx symbols: [ {name: rr}, {name: raw}, {name: cutId, newName: cut} ] - Projection: name: rr(code,labid) newName: labid(labid) - PythonCode: code: | labid_records = sorted([ 'L'+t[0] for t in connect.container.data['labid'].records.values ], key=lambda t: int(t[1:])) rr_records = sorted([ (t[0], 'L'+t[1]) for t in connect.container.data['rr'].records.values ], key=lambda t: int(t[0])) # Trim elements of set cut to two decimal places cut_records = sorted([ '{:.2f}'.format(float(t[0])) for t in connect.container.data['cut'].records.values ], key=float) labdata_records = [ (t[0], 'L'+t[1], '{:.2f}'.format(float(t[2])), t[-1]) for t in connect.container.data['raw'].records.values ] connect.container.addSet('labid_mod', ['*'], records=labid_records) connect.container.addSet('rr_mod', ['*']*2, records=rr_records) connect.container.addSet('cut_mod', ['*'], records=cut_records) connect.container.addParameter('labdata', ['*']*3, records=labdata_records) - GAMSWriter: symbols: [ {name: labid_mod, newName: labid}, {name: rr_mod, newName: rr}, {name: cut_mod, newName: cut}, {name: labdata} ]$offEmbeddedCode

* Reintroduce 0 (zeros)
labdata(rr,cut) = labdata(rr,cut) + eps;

execute 'rm -f labdatanew.xlsx';
* Write new workbook with good table
EmbeddedCode Connect:
symbols: [ {name: labdata} ]
- PandasExcelWriter:
file: labdatanew.xlsx
symbols:
- name: labdata
rowDimension: 2
range: ZAg!A4
endEmbeddedCode
option clear=rr, clear=labdata;

EmbeddedCode Connect:
file: labdatanew.xlsx
symbols:
- name: labdata
rowDimension: 2
columnDimension: 1
range: ZAg!A4
- Projection:
name: labdata(code,labid,cut)
newName: rr(code,labid)
asSet: True
- GAMSWriter:
writeAll: True
endEmbeddedCode
execute.checkErrorLevel 'gdxdiff labdata.gdx labdatanew.gdx > %system.NullFile%';


## Complex Connect Example with CSV Files

The following example shows how to read and write CSV files in Connect. It starts out with defining some data (stockprice) in a table statement in GAMS. With compile-time embedded Connect code utilizing the GAMSReader agent to bring this data into Connect and exporting it as a CSV file with agent CSVWriter. The GDXWriter agent also creates a GDX file with the data which is then used in a subsequent call to feed gdxdump that produces the same CSV file as CSVWriter. The text comparison tool diff is used to compare the two CSV files. The CSV file look as follows:

"date_0","AAPL","GOOG","MMM","MSFT","WMT"
"2012-20-11",12.124061,314.008026,60.966354,21.068886,46.991535
"2112-20-11",12.139372,311.741516,60.731037,20.850344,47.150307
"2212-20-11",12.203673,313.674286,61.467381,20.890808,46.991535
"2312-20-11",12.350039,315.387848,62.401108,21.068886,47.626663
"2712-20-11",12.448025,318.929565,62.461876,21.076981,47.499634
"2812-20-11",12.328911,318.655609,61.604042,20.898905,47.420238
"2912-20-11",12.404848,320.000549,62.332813,21.060795,47.626663
"3012-20-11",12.401172,321.744019,62.044331,21.012224,47.444057


In remainder of the example this CSV file is read back via the Connect agent CSVReader. The code also utilizes the tool csv2gdx to read the CSV file into a GDX file. The code compares the results of both methods. Csv2gdx also creates sets with the index elements as Dim1, Dim2, ... Therefore, Connect utilizes the Projection agent to extract the index sets date and symbol from the parameter stockprice as sets Dim1 and Dim2. The Connect agent GDXWriter creates a GDX file of the Connect database which then can be compared with the GDX file created by csv2gdx. The GDX comparison tool gdxdiff is used to compare the two GDX files. The entire code is listed below. This model is part of DataLib as model connect02.

Set date,symbol;
Table stockprice(date<,symbol<)
AAPL       GOOG       MMM      MSFT       WMT
2012-20-11 12.124061 314.008026 60.966354 21.068886 46.991535
2112-20-11 12.139372 311.741516 60.731037 20.850344 47.150307
2212-20-11 12.203673 313.674286 61.467381 20.890808 46.991535
2312-20-11 12.350039 315.387848 62.401108 21.068886 47.626663
2712-20-11 12.448025 318.929565 62.461876 21.076981 47.499634
2812-20-11 12.328911 318.655609 61.604042 20.898905 47.420238
2912-20-11 12.404848 320.000549 62.332813 21.060795 47.626663
3012-20-11 12.401172 321.744019 62.044331 21.012224 47.444057
;

* Use Connect CSVWriter to write GAMS data in CSV format moving the symbol index into the column (unstack: True)
$onEmbeddedCode Connect: - GAMSReader: symbols: [ {name: stockprice} ] - GDXWriter: file: sp_connect.gdx writeAll: True - CSVWriter: file: sp_connect.csv name: stockprice header: True unstack: True quoting: 2$offEmbeddedCode

* Use gdxdump to create a CSV file and text compare the Connect and gdxdump CSV files
$call.checkErrorLevel gdxdump sp_connect.gdx output=sp_gdxdump.csv symb=stockprice format=csv columnDimension=Y > %system.NullFile%$call.checkErrorLevel  diff -q sp_connect.csv sp_gdxdump.csv > %system.nullFile%

* Use Connect CSVReader to read the newly created CSV file and deposit the result in a csv2gdx compatible format
$onEmbeddedCode Connect: - CSVReader: file: sp_connect.csv name: stockprice indexColumns: 1 valueColumns: "2:lastCol" - Projection: name: stockprice(date,symbol) newName: Dim1(date) asSet: True - Projection: name: stockprice(date,symbol) newName: Dim2(symbol) asSet: True - GDXWriter: file: sp_connect.gdx writeAll: True$offEmbeddedCode

* Use csv2gdx to create a GDX file and compare the Connect and csv2gdx GDX files
$call.checkErrorLevel csv2gdx sp_connect.csv output=sp_csv2gdx.gdx id=stockprice index=1 value=2..lastCol useHeader=y > %system.NullFile%$call.checkErrorLevel gdxdiff sp_connect.gdx sp_csv2gdx.gdx > %system.NullFile%


# Connect Agents

The CSVReader allows reading a symbol from a specified CSV file into the Connect database. Its implementation is based on the pandas.DataFrame class and its I/O API method read_csv.

Option Default Description
autoColumn None Generate automatic column names.
autoRow None Generate automatic row labels.
decimalSeparator . (period) Specify a decimal separator.
fieldSeparator , (comma) Specify a field separator.
file None Specify a CSV file path.
header inferred Indicate if the first row in the CSV file should be used as the column names in the DataFrame.
indexColumns None Specify columns to use as the row labels.
indexSubstitutions None Dictionary used for substitutions in the index columns.
name None Specify a symbol name for the Connect database.
names None List of column names to use.
quoting 0 Control field quoting behavior.
readCSVArguments None Dictionary containing keyword arguments for the pandas.read_csv method.
skipRows None Specify the rows to skip or the number of rows to skip.
stack inferred Stacks the column names to index.
textColumns None Specify columns to get the set element text from.
textSubstitutions None Dictionary used for substitutions in the text columns.
thousandsSeparator None Specify a thousands separator.
trace inherited Specify the trace level for debugging output.
valueColumns None Specify columns to get the values from.
valueSubstitutions None Dictionary used for substitutions in the value columns.

Detailed description of the options:

autoColumn = string (optional)

Generate automatic column names. The autoColumn string is used as the prefix for the column label numbers. This option overrides the use of a header or names. However, if there is a header row, one must skip the row by enabling header or using skipRows.

autoRow = string (optional)

Generate automatic row labels. The autoRow string is used as the prefix for the row label numbers. The generated unique elements will be used in the first index position shifting other elements to the right. Using autoRow can be helpful when there are no labels that can be used as unique elements but also to store entries that would be a duplicate entry without a unique row label.

decimalSeparator = string (default=.)

Specify a decimal separator. [ . (period), , (comma)]

fieldSeparator = string (default=,)

Specify a field separator. [ , (comma), ; (SemiColon), \t (Tab)]

file = string (required)

Specify a CSV file path.

Indicate if the first row in the CSV file should be used as the column names in the DataFrame. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=True and column names are inferred from the first line of data, if column names are passed explicitly then the behavior is identical to header=False. Explicitly pass header=True to be able to replace existing names. Note that missing column names are filled with Unnamed: n (where n is the nth column (zero based) in the DataFrame). Hence, reading the CSV file:

,j1,
i1,1,2
i2,3,4
,5,6


results in the following 2-dimensional parameter:

            j1  Unnamed: 2

i1       1.000       2.000
i2       3.000       4.000


indexColumns = list or string (optional)

Specify columns to use as the row labels. The columns can either be given as column positions or column names. Column positions can be represented as an integer, a list of integers or a string. For example: indexColumns: 1, indexColumns: [1, 2, 3, 4, 6] or indexColumns: "1:4, 6". The symbolic constant lastCol can be used with the string representation: "2:lastCol". Note that the usage of lastCol requires header: True or names given. Column names can be represented as a list of strings. For example: indexColumns: ["i1","i2"]. Note that indexColumns and valueColumns/textColumns and must either be given as positions or names not both.

By default the pandas.read_csv method interprets the following indices as NaN: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifying pandas.read_csv arguments keep_default_na and na_value via readCSVArguments. Rows with indices that are interpreted as NaN will be dropped automatically. The indexSubstitutions option allows to remap NaN entries in the index columns.

indexSubstitutions = dictionary (optional)

Dictionary used for substitutions in the index columns. Each key in indexSubstitutions is replaced by its corresponding value. This option allows arbitrary replacements in the index columns of the DataFrame including stacked column names. Consider the following CSV file:

i1,j1,2.5
i1,,1.7
i2,j1,1.8
i2,,1.4


Reading this data into a 2-dimensional parameter results in a parameter with NaN entries dropped:

            j1

i1       2.500
i2       1.800


By specifying indexSubstitutions: { .nan: j2 } we can substitue NaN entries by j2:

            j1          j2

i1       2.500       1.700
i2       1.800       1.400


name = string (required)

Specify a symbol name for the Connect database. Note that each symbol in the Connect database must have a unique name.

names = list (optional)

List of column names to use. If the file contains a header row, then you should explicitly pass header=True to override the column names. Duplicates in this list are not allowed.

quoting = integer (default=0)

Control field quoting behavior. Use QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3). QUOTE_NONNUMERIC (2) instructs the reader to convert all non-quoted fields to type float. QUOTE_NONE (3) instructs reader to perform no special processing of quote characters.

Dictionary containing keyword arguments for the pandas.read_csv method. Not all arguments of that method are exposed through the YAML interface of the CSVReader agent. By specifying toCSVArguments, it is possible to pass arguments directly to the pandas.read_csv method that is used by the CSVReader agent.

skipRows = list or integer (optional)

Specify the rows to skip (list) or the number of rows to skip (integer). For example: skipRows: [1, 3] or skipRows: 5.

stack = boolean (optional)

Stacks the column names to index. Defaults to True if there is more than one value/text column, otherwise False. Note that missing column names are filled with Unnamed: n (where n is the nth column (zero based) in the DataFrame).

textColumns = list or string (optional)

Specify columns to get the set element text from. The columns can be given as column positions or column names. Column positions can be represented as a integer, a list of integers or a string. For example: textColumns: 1, textColumns: [1, 2, 3, 4, 6] or textColumns: "1:4, 6". The symbolic constant lastCol can be used with the string representation: "2:lastCol". Note that the usage of lastCol requires header: True or names given. Column names can be represented as a list of strings. For example: textColumns: ["i1","i2"]. Note that textColumns and indexColumns must either be given as positions or names not both.

By default the pandas.read_csv method interprets the following text as NaN: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifying pandas.read_csv arguments keep_default_na and na_value via readCSVArguments. Rows with texts that are interpreted as NaN will be dropped automatically. The textSubstitutions option allows to remap NaN entries in the text columns.

textSubstitutions = dictionary (optional)

Dictionary used for substitutions in the text columns. Each key in textSubstitutions is replaced by its corresponding value. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. The default reading behavior is sparse since rows with text that is interpreted as NaN are dropped automatically. Consider the following CSV file:

i1,text1
i2,
i3,text3


thousandsSeparator = string (optional)

Specify a thousands separator.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

valueColumns = list or string (optional)

Specify columns to get the values from. The columns can be given as column positions or column names. Column positions can be represented as a integer, a list of integers or a string. For example: valueColumns: 1, valueColumns: [1, 2, 3, 4, 6] or valueColumns: "1:4, 6". The symbolic constant lastCol can be used with the string representation: "2:lastCol". Note that the usage of lastCol requires header: True or names given. Column names can be represented as a list of strings. For example: valueColumns: ["i1","i2"]. Note that valueColumns and indexColumns must either be given as positions or names not both.

By default the pandas.read_csv method interprets the following values as NaN: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifying pandas.read_csv arguments keep_default_na and na_value via readCSVArguments. Rows with values that are interpreted as NaN will be dropped automatically. Changing the default of values that are interpreted as NaN is useful if, e.g., "NA" values should not be dropped but interpreted as GAMS special value NA. Moreover, the valueSubstitutions option allows to remap NaN entries in the value columns.

valueSubstitutions = dictionary (optional)

Dictionary used for substitutions in the value columns. Each key in valueSubstitutions is replaced by its corresponding value. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. All NaN entries are removed automatically by default which results in a sparse reading behavior. Consider the following CSV file:

i1,j1,
i1,j2,1.7
i2,j1,
i2,j2,1.4


Reading this data into a 2-dimensional parameter results in a sparse parameter with all NaN entries removed:

            j2

i1       1.700
i2       1.400


By specifying valueSubstitutions: { .nan: eps } we get a dense representation where all NaN entries are replaced by GAMS special value EPS:

            j1          j2

i1         EPS       1.700
i2         EPS       1.400


Beside eps there are the following other GAMS special values that can be used by specifying their string representation: inf, -inf, eps, na, and undef. See the GAMS Transfer documentation for more information.

Reading this data into a 1-dimensional set results in a sparse set in which all NaN entries (those that do not have any set element text) are removed:

'i1' 'text 1',
'i3' 'text 3'


By specifying textSubstitutions: { .nan: '' } we get a dense representation:

'i1' 'text 1',
'i2',
'i3' 'text 3'


It is also possible to use textSubstitutions in order to interpret the set element text. Let's assume we have the following CSV file:

,j1,j2,j3
i1,Y,Y,Y
i2,Y,Y,N
i3,0,Y,Y


Reading this data into a 2-dimensional set results in a dense set:

'i1'.'j1' Y,
'i1'.'j2' Y,
'i1'.'j3' Y,
'i2'.'j1' Y,
'i2'.'j2' Y,
'i2'.'j3' N,
'i3'.'j1' 0,
'i3'.'j2' Y,
'i3'.'j3' Y


By specifying textSubstitutions: { 'N': .nan, '0': .nan } we replace all occurrences of N and 0 by NaN which gets dropped automatically:

'i1'.'j1' Y,
'i1'.'j2' Y,
'i1'.'j3' Y,
'i2'.'j1' Y,
'i2'.'j2' Y,
'i3'.'j2' Y,
'i3'.'j3' Y


## CSVWriter

The CSVWriter allows writing a symbol in the Connect database to a specified CSV file.

Option Default Description
decimalSeparator . (period) Specify a decimal separator.
file None Specify a CSV file path.
fieldSeparator , (comma) Specify a field separator.
header True Indicate if the header will be written.
name None Specify the name of the symbol in the Connect database.
quoting 0 Control field quoting behavior.
setHeader None Specify a string that will be used as the header.
skipElementText False Indicate if the set element text will be skipped.
toCSVArguments None Dictionary containing keyword arguments for the pandas.to_csv method.
trace inherited Specify the trace level for debugging output.
unstack False Indicate if the last dimension will be used as a header row.

Detailed description of the options:

decimalSeparator = string (default=.)

Specify a decimal separator. [ . (period), , (comma)]

file = string (required)

Specify a CSV file path.

fieldSeparator = string (default=,)

Specify a field separator. [ , (comma), ; (SemiColon), \t (Tab)]

header = boolean (default=True)

Indicate if the header will be written.

name = string (required)

Specify the name of the symbol in the Connect database.

quoting = integer (default=0)

Control field quoting behavior. Use QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3). QUOTE_MINIMAL (0) instructs the writer to only quote those fields which contain special characters such as fieldSeparator. QUOTE_ALL (1) instructs the writer to quote all fields. QUOTE_NONNUMERIC (2) instructs the writer to quote all non-numeric fields. QUOTE_NONE (3) instructs the writer to never quote fields.

Specify a string that will be used as the header. If an empty header is desired, the string can be empty.

skipElementText = boolean (default=False)

Indicate if the set element text will be skipped. If False, the set element text will be written in the last column of the CSV file.

toCSVArguments = dictionary (optional)

Dictionary containing keyword arguments for the pandas.to_csv method. Not all arguments of that method are exposed through the YAML interface of the CSVWriter agent. By specifying toCSVArguments, it is possible to pass arguments directly to the pandas.to_csv method that is used by the CSVWriter agent.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

unstack = boolean (default=False)

Indicate if the last dimension will be used as a header row.

## DomainWriter

The DomainWriter agent allows to rewrite domain information for existing Connect symbols and helps dealing with domain violations.

Option Type Default Description
dropDomainViolations global/symbol False Indicate how to deal with domain violations.
name symbol None Specify the name of the symbol in the Connect database.
trace global inherited Specify the trace level for debugging output.
writeAll global False Indicate if all symbols in the Connect database will be treated with the global dropDomainViolations setting.

Detailed description of the options:

dropDomainViolations = boolean (global)/string (symbol) (default=False)

The Connect symbols might have some domain violations. This agent allows to drop these domain violations so a write to GAMS or GDX works properly. Setting the global dropDomainViolations: True together with writeAll: True will drop all domain violations from all Connect symbols. The symbols section allows to also drop domain violations. In the symbols section the dropDomainViolations attribute need a string value. Possible string values are before, after, "True", and "False". If the attribute has not been set for the symbol, the attribute is inherited from the global attribute. The value before means that domain violations are dropped before a new domain is applied, see attribute name. The value after means that domain violations are dropped after a new domain is applied. The value "True" means that domain violations are dropped before and after a new domain is applied. "False" means to not drop domain violations.

name = string (required)

Specify a symbol name with index space for the Connect database. name requires the format symname(i1,i2,...,iN). The list of indices needs to coincide with the names of the actual GAMS domain sets for a regular domain. A relaxed domain is set if the index is quoted. For example name: x(i,'j') means that for the first index a regular domain with domain set i is established, while for the second index the universal domain * is used and a relaxed domain name j is set.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

writeAll = boolean (default=False)

Indicate if all symbols in the Connect database will be treated according to global attribute dropDomainViolations.

The GAMSReader allows reading symbols from the GAMS database into the Connect database. Without GAMS context (e.g. when running the gamsconnect script from the command line) this agent is not available and its execution will result in an exception.

Option Type Default Description
name symbol None Specify the name of the symbol in the GAMS database.
newName symbol None Specify a new name for the symbol in the Connect database.
readAll global False Indicate if all symbols in the GAMS database will be read into the Connect database.
trace global inherited Specify the trace level for debugging output.

Detailed description of the options:

name = string (required)

Specify the name of the symbol in the GAMS database.

newName = string (optional)

Specify a new name for the symbol in the Connect database. Each symbol in the Connect database must have a unique name.

readAll = boolean (default=False)

Indicate if all symbols in the GAMS database will be read into the Connect database. If True, read all symbols into the Connect database and ignore symbol options.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

## GAMSWriter

The GAMSWriter allows writing symbols in the Connect database to the GAMS database. Without GAMS context (e.g. when running the gamsconnect script from the command line) and as part of the connectOut command line option this agent is not available and its execution will result in an exception.

Option Type Default Description
domainCheckType global/symbol default Specify if domain checking is applied or if records that would cause a domain violation are filtered.
duplicateRecords global/symbol all Specify how to deal with duplicate records.
mergeType global/symbol default Specify if data in a GAMS symbol is merged or replaced.
name symbol None Specify the name of the symbol in the Connect database.
newName symbol None Specify a new name for the symbol in the GAMS database.
trace global inherited Specify the trace level for debugging output.
writeAll global False Indicate if all symbols in the Connect database will be written to the GAMS database.

Detailed description of the options:

domainCheckType = string (default=default)

Specify if Domain Checking is applied (checked) or if records that would cause a domain violation are filtered (filtered). If left at default it depends on the setting of $on/offFiltered if GAMS does a filtered load or checks the domains during compile time. During execution time default is the same as filtered. duplicateRecords = string (default=all) The Connect container can hold multiple records even for the same indexes. This is only a problem when exchanging the data with GAMS (and GDX). The attribute determines how the agent deals with duplicate records. With the default of all the GAMSWriter will fail in case duplicate records exist. With first the first record will be written to GAMS, with last the last record written to GAMS. With none none of the duplicate records will be written to GAMS. Note that the agent currently deals with duplicate records in a case sensitive way. mergeType = string (default=default) Specify if data in a GAMS symbol is merged (merge) or replaced (replace). If left at default it depends on the setting of$on/offMulti[R] if GAMS does a merge, replace, or triggers an error during compile time. During execution time default is the same as merge.

name = string (required)

Specify the name of the symbol in the Connect database.

newName = string (optional)

Specify a new name for the symbol in the GAMS database. Note, each symbol in the GAMS database must have a unique name.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

writeAll = boolean (default=False)

Indicate if all symbols in the Connect database will be written to the GAMS database. If True, write all symbols to the GAMS database and ignore symbol options.

The GDXReader allows reading symbols from a specified GDX file into the Connect database.

Option Type Default Description
file global None Specify a GDX file path.
name symbol None Specify the name of the symbol in the GDX file.
newName symbol None Specify a new name for the symbol in the Connect database.
readAll global False Indicate if all symbols in the GDX file will be read into the Connect database.
trace global inherited Specify the trace level for debugging output.

Detailed description of the options:

file = string (required)

Specify a GDX file path.

name = string (required)

Specify the name of the symbol in the GDX file.

newName = string (optional)

Specify a new name for the symbol in the Connect database. Each symbol in the Connect database must have a unique name.

readAll = boolean (default=False)

Indicate if all symbols in the GDX file will be read into the Connect database. If True, read all symbols into the Connect database and ignore symbol options.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

## GDXWriter

The GDXWriter allows writing symbols in the Connect database to a specified GDX file.

Option Type Default Description
duplicateRecords global/symbol all Specify how to deal with duplicate records.
file global None Specify a GDX file path.
name symbol None Specify the name of the symbol in the Connect database.
newName symbol None Specify a new name for the symbol in the GDX file.
trace global inherited Specify the trace level for debugging output.
writeAll global False Indicate if all symbols in the Connect database will be written to the GDX file.

Detailed description of the options:

duplicateRecords = string (default=all)

The Connect container can hold multiple records even for the same indexes. This is only a problem when exchanging the data with GDX (and GAMS). The attribute determines how the agent deals with duplicate records. With the default of all the GDXWriter will fail in case duplicate records exist. With first the first record will be written to GDX, with last the last record written to GDX. With none none of the duplicate records will be written to GDX. Note that the agent currently deals with duplicate records in a case sensitive way.

file = string (required)

Specify a GDX file path.

name = string (required)

Specify the name of the symbol in the Connect database.

newName = string (optional)

Specify a new name for the symbol in the GDX file. Note, each symbol in the GDX file must have a unique name.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

writeAll = boolean (default=False)

Indicate if all symbols in the Connect database will be written to the GDX file. If True, write all symbols to the GDX file and ignore symbol options.

## Options

The Options agent allows to set more general options that can affect the Connect database and other Connect agents. More specifically, the value of an option set via the Options agent can be inherited as a default value to Connect agents that utilize the considered option.

Option Default Description
caseSensitiveLookup True Specify how to lookup symbols specified by attribute name and others in the Connect database.
debug 0 If set to >0 a full trace is written to stdout in case of an exception.
trace 0 Specify the trace level for debugging output.

Detailed description of the options:

caseSensitiveLookup = boolean (default=True)

Specify how to lookup symbols in the Connect database. Unlike GAMS, Connect uses case sensitive symbol names hence one needs to pay attention to symbol name casing when the symbol name is specified in an agent description, e.g. via the name attribute. If the caseSensitiveLookup attribute is set False Connect will also find symbols if their names match in an case-insensitive way. If multiple symbols with same lower case name exist in the Connect database, the symbol that matches exactly is found. If none of the symbols match exactly, then one of symbols is found but it is unclear which one. For example, symbols ABC and abc exist in the Connect database. If name: ABC is given symbol ABC will be used, but if name: aBc is given either abc or ABC is returned. Multiple symbols with the same lower-case name should be avoided, they cause problems e.g. when writing to GAMS or GDX with attribute writeAll: True.

debug = integer (default=0)

A Connect error triggers an exception. By default only the exception message is provided to the user. In case the debug attribute is set to an integer larger than 0, a full trace with the calling stack is printed to stdout.

trace = integer (default=0)

Specify the trace level for debugging output. A trace level of 0 (default) means no debugging output. For trace > 0 the Connect database will write some scalar debugging output to the log. The debugging output of Connect agents depends on their implementation of trace, please refer to the corresponding documentation.

The PandasExcelReader agent allows to read symbols from an Excel file into the Connect database. Its implementation is based on the pandas.DataFrame class and its I/O API method read_excel. The PandasExcelReader primarily aims to read spreadsheets that have been written by its counterpart - the PandasExcelWriter.

Option Type Default Description
columnDimension global/symbol 1 Column dimension of the symbol.
drop symbol None Specify a string for dropping each row containing it in one of its labels.
excelFileArguments global None Dictionary containing keyword arguments for the pandas.ExcelFile constructor.
file global None Specify an Excel file path.
multiColumnBlankRow global/symbol True Indicator for existence of blank row after the column indexes (for columnDimension>1).
name symbol None Specify the name of the symbol in the Connect database.
range symbol None Specify the Excel range of a symbol.
readExcelArguments symbol None Dictionary containing keyword arguments for the pandas.read_excel method.
rowDimension global/symbol 1 Row dimension of the symbol.
trace global inherited Specify the trace level for debugging output.
type global/symbol par Control the symbol type.
valueSubstitutions symbol None Dictionary used for mapping in the value column of the DataFrame.

Detailed description of the options:

columnDimension = integer (default=1)

Column dimension: the number of rows in the data range that will be used to define the labels for columns. The first columnDimension rows of the data range will be used for labels.

drop = string (optional)

Specify a string for dropping each row containing it in one of its labels. The specified string is interpreted as a regular expression.

excelFileArguments = dictionary (optional)

Dictionary containing keyword arguments for the pandas.ExcelFile constructor.

file = string (required)

Specify an Excel file path.

multiColumnBlankRow = boolean (default=True)

For symbols where more than one dimension is in the columns, i.e. columnDimension>1 the PandasExcelReader expects a blank row before the data starts. This is also the shape the PandasExcelWriter writes:

Blank row between column headers and data

If multiColumnBlankRow is set to False, the PandasExcelReader expects for table with columnDimension>1 that this blank line is missing. This works properly with the exception of the following corner case:

Blank row between column headers and data missing and first data row is entirely blank

together with the following Connect instructions

- PandasExcelReader:
file: myfile.xlsx
symbols:
- name: s
rowDimension: 1
columnDimension: 2
range: B2:E6
type: set
multiColumnBlankRow: False
valueSubstitutions: { .nan: '' } # read dense


one would expect that the Connect database contains a set with the following elements k1*k3.(i1.j1,i2.j2,i3.j3) but Pandas interprets the row 4 (because it is entirely blank) as the row with the index name. Hence the data starts in row 5 and the Connect database is missing the k1 records: k2*k3.(i1.j1,i2.j2,i3.j3). In such a case one either needs the blank row between column indexes and data or manages to have this row not entirely empty (which can even be done outside the specified Excel range).

name = string (required)

Specify a symbol name for the Connect database. Note that each symbol in the Connect database must have a unique name.

range = string (required)

Specify the Excel range of a symbol using the format sheet!range. range can be either a single cell also known as open range (north-west corner like B2) or a full range (north-west and south-east corner like B2:D4). For symbols with columnDimension=0 and/or rowDimension=0, the ending row and/or ending column or the open range can be deduced and is used to restrict the data area.

Dictionary containing keyword arguments for the pandas.read_excel method. Not all arguments of that method are exposed through the YAML interface of the PandasExcelReader agent. By specifying readExcelArguments, it is possible to pass arguments directly to the pandas.read_excel method that is used by the PandasExcelReader agent.

rowDimension = integer (default=1)

Row dimension: the number of columns in the data range that will be used to define the labels for the rows. The first rowDimension columns of the data range will be used for the labels.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

type = string (default=par)

Control the symbol type. Supported symbol types are par for parameters and set for sets.

valueSubstitutions = dictionary (optional)

Dictionary used for mapping in the value column of the DataFrame. Each key in valueSubstitutions is replaced by its corresponding value. The replacement is only performed on the value column of the DataFrame which is the numerical value in case of a GAMS parameter and the set element text in case of a GAMS set. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. All NaN entries are removed automatically by default which results in a sparse reading behavior. Let's assume we have the following spreadsheet:

Two dimensional data containing NaN entries

Reading this data into a 2-dimensional parameter results in a sparse parameter in which all NaN entries are removed:

'i1'.'j1' 2.5,
'i1'.'j2' 1.7,
'i2'.'j2' 1.8,
'i2'.'j3' 1.4


By specifying valueSubstitutions: { .nan: eps } we get a dense representation in which all NaN entries are replaced by GAMS special value EPS:

'i1'.'j1' 2.5,
'i1'.'j2' 1.7,
'i1'.'j3' Eps,
'i2'.'j1' Eps,
'i2'.'j2' 1.8,
'i2'.'j3' 1.4


Beside eps there are the following other GAMS special values that can be used by specifying their string representation: inf, -inf, eps, na, and undef. See the GAMS Transfer documentation for more information.

Let's assume we have data representing a GAMS set:

Data representing a GAMS set

Reading this data into a 1-dimensional set results in a sparse set in which all NaN entries (those that do not have any set element text) are removed:

'i1' 'text 1',
'i3' 'text 3'


By specifying valueSubstitutions: { .nan: '' } we get a dense representation:

'i1' 'text 1',
'i2',
'i3' 'text 3'


It is also possible to use valueSubstitutions in order to interpret the set element text. Let's assume we have the following Excel data:

Data representing a GAMS set

Reading this data into a 2-dimensional set results in a dense set:

'i1'.'j1' No,
'i1'.'j2' Y,
'i1'.'j3' Y,
'i2'.'j1' Y,
'i2'.'j2' Y,
'i2'.'j3' Y,
'i3'.'j1' Y,
'i3'.'j2' Y,
'i3'.'j3' N


By specifying valueSubstitutions: { 'N': .nan, 'No': .nan } we replace all occurrences of N and No by NaN which gets dropped automatically. Note that No has to be quotes in order to not be interpreted as False by the YAML parser:

'i1'.'j2' Y,
'i1'.'j3' Y,
'i2'.'j1' Y,
'i2'.'j2' Y,
'i2'.'j3' Y,
'i3'.'j1' Y,
'i3'.'j2' Y


As mentioned at the start of this section, the PandasExcelReader works best with tables written by the PandasExcelWriter with a full range (north-west and south-east corner) specification. Nevertheless, the PandasExcelReader can also process tables not precisely in the format and shape given by PandasExcelWriter and also works with an open range (north-west corner only) specification. While the PandasExcelReader shares some functionality with the tool gdxxrw there are also significant differences and this section explains some of the perhaps unexpected behavior of this Connect agent.

• Symbols with rowDimension=0 and/or columnDimension=0 have an artificial index in 0-dim index and the range specification need to include this artificial index. For example, if one wants to read a scalar, there are two artificial indexes rval and cval and the north-west corner starts in the artificial index row and column, i.e. range is B2 or B2:C3 in the following example:

Artificial indexes rval and cval when reading a scalar

The names of the artificial indexes are irrelevant, they can even be blank. But as a consequence of the artificial indexes, PandasExcelReader cannot read a scalar that is located in row 1 or column A. Similarly, symbols with dim>0 but either rowDimension or columnDimension equal to 0 need an artificial index:

Artificial index value when reading a symbol with rowDimension=0
Artificial index value when reading a symbol with columnDimension=0

Again, the name of the artificial index is irrelevant, it can even be blank. In both examples, the range starts in north-west corner B2. For symbols with rowDimension>0 and columnDimension>0 there is no artificial index.

• Tables with more than one index in the columns can be read best if the column headers and the data are separated by a blank line (this is the way PandasExcelWriter writes such tables). The PandasExcelReader attribute multiColumnBlankRow allows some control. See the option descriptions for details.
• Blank data is read as nan (not a number) and such records are dropped before the data is written to the Connect database. The valueSubstitutions attribute of PandasExcelReader allows to remap nan to other values. Blank index positions are treated very differently. The behavior depends on multiple factors.

If PandasExcelReader encounters a blank index at the beginning of the rows it fills the index with nan. With a blank row index in the middle of the table (i.e. if there was a good label before in this column), PandasExcelReader repeats the previous index of this column. In case of columnDimension=1 blank column indexes are filled with Unnamed: n (where n in the nth column (zero based) in the sheet). Hence the following table

Table with blank row and column index cells (columnDimension=1)

is transformed into the following pandas.DataFrame (use PandasExcelReader attribute trace>2 to print intermediate data frames):

               j1  Unnamed: 4    j3
NaN NaN  NaN         6.0   NaN
i2  i2   2.0         7.0  12.0
i2  i3   3.0         NaN  13.0
i2  i3   NaN         NaN   NaN
i5  i3   5.0        10.0  15.0
i6  i6   6.0        11.0  16.0

which arrives after dropping the nan values and indexes in the Connect database (here a display as a GAMS parameter) as:

                    j1  Unnamed: 4          j3
i2.i2           2           7          12
i2.i3           3                      13
i5.i3           5          10          15
i6.i6           6          11          16

In case of columnDimension>1 blank column index at the beginning of the column index are filled with Unnamed: n_level_k (where n in the nth column (zero based) in the sheet and k is the column dimension). With a blank column index in the middle of the index columns (i.e. if there was a good label before in this row), PandasExcelReader repeats the previous index but might add a suffix .1, .2, ... to disambiguate the column names. Hence the following table

Table with blank row and column index cells (columnDimension=2)

is transformed into the following pandas.DataFrame (use PandasExcelReader attribute trace>2 to print intermediate data frames):

             Unnamed: 3_level_0   k1    k1    k1    k4
Unnamed: 3_level_1   j1    j2  j2.1    j4
NaN NaN                NaN  NaN   6.0   NaN  17.0
i2  i2                 NaN  2.0   7.0  12.0  18.0
i2  i3                 NaN  3.0   NaN  13.0  19.0
i2  i3                 NaN  NaN   NaN   NaN   NaN
i5  i3                 NaN  5.0  10.0  15.0  20.0
i6  i6                 NaN  6.0  11.0  16.0  21.0

which arrives after dropping the nan values and indexes in the Connect database (here a display as a GAMS parameter) as:

                 k1.j1       k1.j2     k1.j2.1       k4.j4
i2.i2           2           7          12          18
i2.i3           3                      13          19
i5.i3           5          10          15          20
i6.i6           6          11          16          21

The PandasExcelReader attribute drop helps to get rid of unwanted indexes (e.g. drop: Unnamed or drop: "\."). Removing row indexes that result from the continuation of previous indexes is significantly harder. Hence such rows should be entirely empty (then the row is dropped because it has only nan values). It is best to avoid empty index rows and columns altogether.

• When one specifies an open range (north-west corner only) the PandasExcelReader will read from this north-west corner all the way to the end of the sheet (press Ctrl-End in Excel to locate the cursor into the last cell of a sheet). It does not stop at blank row or column indexes as the tool gdxxrw does. This aggravates the situation with empty index cells. Hence the following table

Table with a non-blank cell at the end of the sheet

will result in the following pandas.DataFrame (use PandasExcelReader attribute trace>2 to print intermediate data frames):

          j1   j2   j3  Unnamed: 5  Unnamed: 6  Unnamed: 7  Unnamed: 8
i1  1.0  2.0  NaN         NaN         NaN         NaN         NaN
i2  NaN  1.0  NaN         NaN         NaN         NaN         NaN
i3  NaN  NaN  1.0         NaN         NaN         NaN         NaN
i3  NaN  NaN  NaN         NaN         NaN         NaN         NaN
i3  NaN  NaN  NaN         NaN         NaN         NaN         NaN
i3  NaN  NaN  NaN         NaN         NaN         NaN         NaN
i3  NaN  NaN  NaN         NaN         NaN         NaN       999.0

which arrives after dropping the nan values and indexes in the Connect database (here a display as a GAMS parameter) as:

                 j1          j2          j3  Unnamed: 8
i1           1           2
i2                       1
i3                                   1         999

## PandasExcelWriter

The PandasExcelWriter agent allows to write symbols from the Connect database to an Excel file. Its implementation is based on the pandas.DataFrame class and its I/O API method to_excel. If the Excel workbook exists, the PandasExcelWriter will attempt to append to the existing workbook.

Attention
Please be aware of the following limitation when appending to an Excel file with formulas using the PandasExcelWriter: Whereas Excel stores formulas and the corresponding values, both Pandas I/O API methods read_excel and to_excel read/store either formulas or values, not both. As a consequence, when appending to an Excel file with formulas using the PandasExcelWriter, all the cells with the formulas will not have values anymore and a subsequent read by the PandasExcelReader results into NaN for cells with formulas. To avoid this, one way is to have a "read-only" input Excel file and write to a separate output Excel file. On Windows one can merge both Excel files at the end (e.g. using a VB script).
Option Type Default Description
excelWriterArguments global None Dictionary containing keyword arguments for the pandas.ExcelWriter constructor.
file global None Specify an Excel file path.
name symbol None Specify the name of the symbol in the Connect database.
range symbol None Specify the Excel range of a symbol.
rowDimension global/symbol None Row dimension of the symbol.
toExcelArguments symbol None Dictionary containing keyword arguments for the pandas.to_excel method.
trace global inherited Specify the trace level for debugging output.
valueSubstitutions symbol None Dictionary used for mapping in the value column of the DataFrame.

Detailed description of the options:

excelWriterArguments = dictionary (optional)

Dictionary containing keyword arguments for the pandas.ExcelWriter constructor.

file = string (required)

Specify an Excel file path.

name = string (required)

Specify a symbol name for the Connect database.

range = string (required)

Specify the Excel range of a symbol using the format sheet!range. range can be either a single cell (north-west corner like B2) or a full range (north-west and south-east corner like B2:D4). For writing purposes, the south-east corner is ignored. Please note, that PandasExcelWriter always writes an index in the row and column. So even for a scalar or an indexed symbol with rowDimension=0 or dim-rowDimension=0 there will be some index information in the 0 index. For example, the following Connect script

parameter x0 / 3.14 /, x1 / i1 1, i2 2, i3 3 /;
$onEmbeddedCode Connect: - GAMSReader: readAll: True - PandasExcelWriter: file: x.xlsx symbols: - name: x0 range: Sheet1!a1 - name: x1 rowDimension: 1 range: Sheet1!d1 - name: x1 rowDimension: 0 range: Sheet1!g1$offEmbeddedCode


creates the following Excel output:

Value index for 0 index dimension

rowDimension = integer (optional)

Row dimension: The first rowDimension index positions of the symbol to be written will written to the rows. dim-rowDimension index positions will be written into the column headers.

toExcelArguments = dictionary (optional)

Dictionary containing keyword arguments for the pandas.DataFrame.to_excel method. Not all arguments of that method are exposed through the YAML interface of the PandasExcelWriter agent. By specifying toExcelArguments, it is possible to pass arguments directly to the pandas.to_excel method that is used by the PandasExcelWriter agent.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate data frames will be written abbreviated to the log. For trace > 3 the intermediate data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

valueSubstitutions = dictionary (optional)

Dictionary used for mapping in the value column of the DataFrame. Each key in valueSubstitutions is replaced by its corresponding value. The replacement is only performed on the value column of the DataFrame which is the numerical value in case of a GAMS parameter, variable or equation and the set element text in case of a GAMS set.

The PandasSQLReader agent allows to read symbols from a specified database management system into the Connect database. Its implementation is based on the pandas.DataFrame class and its I/O API method read_sql. The method takes in the SQL query and a SQLalchemy connectable url.

Note
The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. See connection for more information.
Option Type Default Description
connection global None Connection string to the database.
dTypeMap global/symbol None Dictionary used to specify the dtype of columns.
indexSubstitutions symbol True Dictionary used for substitutions in the index columns.
name symbol None Specify the name of the symbol in the Connect database.
query symbol None Specify the SQL query.
readSQLArguments symbol None Dictionary containing keyword arguments for the pandas.read_sql method.
sqlEngineArguments global None Dictionary containing keyword arguments for the sqlalchemy.create_engine constructor.
trace global inherited Specify the trace level for debugging output.
type global/symbol par Control the symbol type.
valueColumns symbol inferred Specify columns to get the values from.
valueSubstitutions symbol None Dictionary used for mapping in the value column of the DataFrame.

Detailed description of the options:

connection = string (required)

Allows to specify the connection string to the database and user credentials in order to access the database. Specify the absolute path to the database or specify SQLalchemy connectable url. Here are some examples:

Microsoft Access:

connection: "access+pyodbc://?odbc_connect=DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=<absolute//path//to//datafile.accdb(.mdb)>;"

Note
The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. If no MS-Access is installed or a 32-bit version of MS-Access, download and install a 64-bit MS-Access ODBC Driver as a redistributable package from MS (e.g. MS-Access 2013 Runtime_x64).

SQL Server:

connection: "mssql://?odbc_connect=DRIVER={SQL Server Native Client 11.0};SERVER=<computerName>\\SQLEXPRESS;Trusted_Connection=yes;"


SQLite:

connection: "sqlite:///<absolute//path//to//foo.db>"


Postgres:

connection: "postgresql://scott:tiger@localhost/mydatabase"


dTypeMap = dict (optional)

Allows to specify the dtype of columns in a dictionary as key: value pairs, i.e. <column>: <dtype>.

indexSubstitutions = dict (optional)

Dictionary used for substitutions in the index columns. Each key in indexSubstitutions is replaced by its corresponding value. This option allows arbitrary replacements in the index columns.

name = string (required)

Name of the symbol in the Connect database. The name must be unique for each symbol.

query = string (required)

Specify the SQL query which will fetch the desired table from the database system.

Dictionary containing keyword arguments for the pandas.read_sql method. Not all arguments of that method are exposed through the YAML interface of the PandasSQLReader agent. By specifying readSQLArguments, it is possible to pass arguments directly to the pandas.read_sql method that is used by the PandasSQLReader agent.

sqlEngineArguments = dict (optional)

Dictionary containing keyword arguments for the sqlalchemy.create_engine constructor.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate arrays and data frames will be written abbreviated to the log. For trace > 3 the intermediate arrays and data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

type = string (default=par)

Control the symbol type. Supported symbol types are par for GAMS parameters and set for GAMS sets.

valueColumns = list or string (optional)

Specify columns to get the values from. The value columns contain numerical values in case of a GAMS parameter and set element text in case of a GAMS set. The columns are given as column names represented as a list of strings. For example: valueColumns: ["i1", "i2"]. If there is more than one value column specified, the column names are stacked to index automatically. As string one can specify the symbolic constant lastCol (i.e. valueColumns: "lastCol") or an empty string (i.e. valueColumns: ""). When lastCol is passed the last column will be treated as a value column and all the other columns will be treated as index columns. When an empty list is passed all columns will be treated as index columns. Specifying an empty list is only valid for symbol type set since symbol type par requires at least one value column. The default for symbol type par is lastCol and the default for symbol type set is an empty string.

valueSubstitutions = dict (optional)

Dictionary used for mapping in the value column of the DataFrame. Each key in valueSubstitutions is replaced by its corresponding value. The replacement is only performed on the value column of the DataFrame which is the numerical value in case of a GAMS parameter and the set element text in case of a GAMS set. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading.

## PandasSQLWriter

The PandasSQLWriter agent allows to write symbols from the Connect database to a specified database management system. Its implementation is based on the pandas.DataFrame class and its I/O API method to_sql.

Note
The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. See connection for more information.
Option Type Default Description
connection global None Connection string to the database.
ifExists symbol fail Specify the behavior when a table with the same name exists in the database/schema.
name symbol None Specify the name of the symbol in the Connect database.
schemaName global/symbol None Specify the schema name.
sqlEngineArguments global None Dictionary containing keyword arguments for the sqlalchemy.create_engine constructor.
tableName symbol None Specify the SQL table/relation in the provided database/schema.
toSQLArguments symbol None Dictionary containing keyword arguments for the pandas.to_sql method.
trace global inherited Specify the trace level for debugging output.
unstack symbol False Indicate if the last index column will be used as a header row.
valueSubstitutions symbol None Dictionary used for mapping in the value column of the DataFrame.

Detailed description of the options:

connection = string (required)

Allows to specify the connection string to the database and user credentials in order to access the database. Specify the absolute path to the database or specify SQLalchemy connectable url.

Note
The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. If no MS-Access is installed or a 32-bit version of MS-Access, download and install a 64-bit MS-Access ODBC Driver as a redistributable package from MS (e.g. MS-Access 2013 Runtime_x64).

ifExists = string (default=fail)

Specify the behavior when a table with the same name exists in the database/schema. Valid values are fail, replace and append.

name = string (required)

Specify the name of the symbol in the Connect database.

schemaName = string (optional)

Specify the schema name for writing the table to the correct location. In Postgres, by default, it writes to a public schema already present in every database.

sqlEngineArguments = dict (optional)

Dictionary containing keyword arguments for the sqlalchemy.create_engine constructor.

tableName = string (required)

Name of the SQL table/relation in the provided database/schema.

toSQLArguments = dict (optional)

Dictionary containing keyword arguments for the pandas.to_sql method. Not all arguments of that method are exposed through the YAML interface of the PandasSQLWriter agent. By specifying toSQLArguments, it is possible to pass arguments directly to the pandas.to_sql method that is used by the PandasSQLWriter agent.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate arrays and data frames will be written abbreviated to the log. For trace > 3 the intermediate arrays and data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

unstack = boolean (default=False)

Indicate if the last index column will be used as a header row.

valueSubstitutions = dict (optional)

Dictionary used for mapping in the value column of the DataFrame. Each key in valueSubstitutions is replaced by its corresponding value. The replacement is only performed on the value column of the DataFrame which is the numerical value in case of a GAMS parameter, variable or equation and the set element text in case of a GAMS set.

## Projection

The Projection agent allows index reordering and projection onto a reduced index space of a GAMS symbol. For variables and equations a suffix (.l, .m, .lo, .up, or .scale) can be extracted and is written to a parameter. Otherwise, the type of the source symbol determines the type of the new symbol, unless asSet is set to True. Variables and equations can be turned into parameters with an extra index containing the labels level, marginal, lower, upper, and scale if the attribute asParameter is set to True. Moreover, if name is a list of scalar symbols of the same type (parameters, variables, or equations), they can be stored in a one-dimensional symbols of the same type with the index label being the name of the scalar symbol.

Option Default Description
aggregationMethod first Specify the aggregation method for the projection.
asParameter False Indicate that variable or equation symbols will be turned into a parameter with an extra index that contains the suffix.
asSet False Indicate that the new symbol is a set independent of the type of the source symbol.
name None Specify a symbol name with index space and potentially suffix for the Connect database.
newName None Specify a new name with index space for the symbol in the Connect database.
text None Element text for resulting sets.
trace inherited Specify the trace level for debugging output.

Detailed description of the options:

aggregationMethod = string (default=first)

Specify the method to aggregate when at least one index position is projected out. The default is first, meaning that the first record will be stored in the new symbol. For sets, variables, and equations (without a suffix specified) only first and last are meaningful. For parameters, variables, and equations with suffix many other aggregation methods are available and meaningful: mad (mean absolute deviation), max, mean, median, min, prod, sem (unbiased standard error of the mean), sum, std (standard deviation), nunique (number of distinct elements), first, last. The projection agent is based on pandas DataFrames and more detailed explanations of the aggregation method can be found at the pandas website.

asParameter = boolean (default=False)

If the symbol specified by name is a variable or equation and asParameter is set to True, the new symbol (after potential aggregationMethod is applied) will be a parameter with an additional index (at the end of the index list) that contains the suffix labels level, marginal, lower, upper, and scale and the corresponding suffix value.

asSet = boolean (default=False)

Usually the type of the source symbol and the use of a suffix with variables and equations determine the type of the target symbol. With asSet set to True the target symbol will be a set.

name = string or list of strings(required)

One either specifies a single symbol name with index space and potentially suffix for the Connect database or a list of symbol names of scalar symbols. In the prior case, name requires the format symname[.suffix](i1,i2,...,iN). The suffix is only allowed on variable and equation symbols and need to be either l, m, lo, up, or scale. The list of indices does not need to coincide with the names of the actual GAMS domain sets. This index list together with the index list specified for newName is solely intended to establish the index order in the symbol specified by newName. In the latter case (a list of symbol names) the symbols need to be scalar symbols of the same type (parameter, variable, or equation) and a new one-dimensional symbol (of the same type) is created (using newName) that holds the symbol names as labels.

newName = string (required)

Specify a new name with index space for the projected or reordered symbol in the Connect database. Note that each symbol in the Connect database must have a unique name. newName is given as symname(i1,i2,...,iN). The list of indices does not need to coincide with the names of the actual GAMS domain sets. This index list together with the index list specified for name is solely intended to establish the index order. Hence, the names in the index list need to be unique and only names that are part of the index list specified for name can be used. For example: name: p(I,j,k) and newName: q(k,i).

text = string (default=None)

Control the handling of element text if the resulting symbol is a set. If set to "", the text will be dropped. When left at default (None) and the projected symbol is a set, the element text of the original set will be used. For other symbols types, the text will be dropped. If text is a string, this string will be assigned to all elements. The string can contain place holders {i1} that will be replaced with the content of the matching index position. For example, text: "{i2} - {i1}: {element_text}", where {i1} and {i2} should be index space names in the symbol name with index space (attribute name). {element_text} refers to original element text (set) or string representation of a numerical value (parameter or variable/equation with a given suffix) of the source symbol.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the intermediate arrays and data frames will be written abbreviated to the log. For trace > 3 the intermediate arrays and data frames will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

## PythonCode

The PythonCode agent allows to execute arbitrary Python code. From within the code, it is possible to access the GAMS database via gams.db (if the PythonCode agent is running in a GAMS context) and the Connect database via connect.container. The GAMS database is an instance of GamsDatabase, whereas the Connect database is a GAMS Transfer Container object. Furthermore there is a predefined instructions list that can be filled with tasks that are automatically executed.

Option Default Description
code None Python code to be executed.

Detailed description of the options:

code = string (required)

Python code to be executed. The YAML syntax offers the pipe character (|) for specifying multi-line strings:

- PythonCode:
code: |
print("Print from Python")
# insert more Python code here


It is possible to generate instructions by appending tasks to the Python instructions list. A task is specified by using Python data structures that match the schema of a specific Connect agent. At the end of the Python code, all tasks in the instructions list are automatically generated and executed. The following example shows how to fill the instructions list with three PandasExcelWriter tasks that write different parameters (p0, p1, p2) into separate Excel workbooks (data_p0.xlsx, data_p1.xlsx, data_p2.xlsx).

- GAMSReader:
- PythonCode:
code: |
symbols = [ 'p0', 'p1', 'p2' ]
for s in symbols:
instructions.append(
{
'PandasExcelWriter':
{
'file': 'data_{}.xlsx'.format(s),
'symbols': [{'name': s, 'rowDimension': connect.container.data[s].dimension, 'range': s+'!A1'}]
}
})


Using connect.container allows to access the Connect database directly in the Python code. The connect.container is a GAMS Transfer Container object and data within the container is stored as Pandas DataFrames. Please refer to the documentation of GAMS Transfer to learn more about GAMS Transfer and its functionalities. The following example shows how to access and modify Connect container data and manually add a new symbol with the modified data to the Connect container:

Set i /1, 2, 3/
i_mod;

$onembeddedCode Connect: - GAMSReader: readAll: True - PythonCode: code: | i_mod_records = [ 'i'+n for n in connect.container.data["i"].records.iloc[:,0] ] connect.container.addSet("i_mod", ["*"], records=i_mod_records) - GAMSWriter: symbols: - name: i_mod$offembeddedCode

display i, i_mod;


The first line takes the data of set i and adds an 'i' at the beginning of each uel in the first column of the dataframe. The last line writes the modified dataframe as set i_mod to the Connect database.

Here is another example modifying Connect container data and adding a new symbol with the modified data:

Parameter p;
$onEcho > p_raw.csv i,j,2000,2001 i1,j1,1,2 i2,j2,3,4 i3,j3,5,6$offEcho
$onEmbeddedCode Connect: - CSVReader: file: p_raw.csv name: p_raw header: True indexColumns: "1,2" valueColumns: "2:lastCol" - PythonCode: code: | p_records = [ [r[0] + '_' + r[1]] + list(r) for i,r in connect.container.data["p_raw"].records.iterrows() ] connect.container.addParameter("p", ["*"]*4, records=p_records) - GAMSWriter: symbols: - name: p$offEmbeddedCode

display p;


In this example, we take the data of parameter p_raw and insert a column of the concatenated row dimensions into the first column of the dataframe. The modified dataframe is then added to the Connect container as records for the new symbol p. Here is a display of GAMS parameter p:

INDEX 1 = i1_j1

2000        2001

i1.j1       1.000       2.000

INDEX 1 = i2_j2

2000        2001

i2.j2       3.000       4.000

INDEX 1 = i3_j3

2000        2001

i3.j3       5.000       6.000


The RawExcelReader allows reading of unstructured data from a specified Excel file into the Connect database. This reader works similarly compared to the xlsdump tool. It reads the entire spreadsheet and represents its content in a couple of GAMS sets:

• s /s1, s2,.../ (workbook sheets)
• w / Sheet1, Sheet2, ... / (workbook sheets by name)
• ws(s,w) / s1.Sheet1, s2.Sheet2, ... / (workbook map)
• r / r1, r2, ... / (rows)
• c / c1, c2, ... / (columns)
• vs(s,r,c) / s1.r1.c2 "cell text", ... / (cells with explanatory text)
• vu(s,r,c,*) / s1.r1.c1."cell text" "cell text", ... (cells with potential GAMS label)

and a parameter vf(s,r,c) / s1.r2.c2 3.14, ... / (cells with numerical values). Cells with a date will be stored in it's string representation in vu and as a Julian date in vf. Cells with a string value will be stored in vs. If the string length exceeds the maximum length allowed for elements text, it will be truncated. Excel offers many other cell value types. RawExcelReader will try to represent the cell value as a number and if this succeeds stores the number in vf. Strings of GAMS special values names INF, EPS, NA, and UNDEF will be also converted to its numerical counterpart. It will also try to represent the cell value as a string and stores this as a label in the fourth position in vu. GAMS labels have a length limitation and hence RawExcelReader automatically shortens the label to fit this limit. RawExcelReader will provide a unique label (ending in ~n where n is an integer for strings exceeding the label length limit) for each string in the workbook. The full string (if it fits) will be available as the element text of the vu record.

Option Default Description
cName c Symbol name for columns.
columnLabel C Label for columns.
file None Specify an Excel file path.
mergedCells False Control the handling of empty cells that are part of a merged Excel range.
rName r Symbol name for rows.
rowLabel R Label for rows.
sheetLabel S Label for workbook sheets.
sName s Symbol name for workbook sheets.
trace inherited Specify the trace level for debugging output.
vfName vf Symbol name for cells with a numerical value.
vsName vs Symbol name for cells with an explanatory text.
vuName vu Symbol name for cells with a potential GAMS label.
wName w Symbol name for workbook sheets by name.
wsName ws Symbol name for workbook map.

Detailed description of the options:

cName = string (default=c)

Control the name of the set of columns.

columnLabel = string (default=C)

Control the labels for the set of columns (c).

file = string (required)

Specify an Excel file path.

mergedCells = boolean (default=False)

Control the handling of empty cells that are part of a merged Excel range. If False, the cells are left empty. If True, the merged value is used in all cells. Note that setting this option to True has an impact on performance since the Excel workbook has to be opened in a non-read-only mode that results in non-constant memory consumption (no lazy loading).

rName = string (default=r)

Control the name of the set of rows.

rowLabel = string (default=R)

Control the labels for the set of rows (r).

sheetLabel = string (default=S)

Control the labels for the set of workbook sheet (s).

sName = string (default=s)

Control the name of the set of workbook sheets.

trace = integer (optional)

Specify the trace level for debugging output. For trace > 1 some scalar debugging output will be written to the log. For trace > 2 the cell values and it's processing will be written entirely to the log (potentially large output). If trace has not been set, the trace value, set by the Options agent, will be used.

vfName = string (default=vf)

Control the name of the parameter for cells with a numerical value.

vsName = string (default=vs)

Control the name of the set for cells with an explanatory text.

vuName = string (default=vu)

Control the name of the set for cells with a potential GAMS label.

wName = string (default=w)

Control the name of the set of workbook sheets by name.

wsName = string (default=ws)

Control the name of the set of the workbook map.

# Text Substitutions in YAML File

In many cases one would like to parameterize the text in the Connect instruction file. For example, some of the Connect agents require a file name. Instead of hard coding the file name into the YAML instructions, text substitutions allow to have a place holder for the attribute that is substituted out before giving the instructions to Connect. The place holder in the YAML file uses the syntax %SOMETEXT%, similar to the GAMS compile-time variables. For example:

- CSVReader:
file: %MYFILENAME%
name: distance
indexColumns: [1, 2]
valueColumns: [3]


Depending on how Connect runs, the substitution is done in various ways. The section Substitutions in Embedded Connect Code described the substitution mechanisms for embedded Connect code. When Connect is initiated via the command line parameters connectIn or connectOut, the user defined parameter specified by double-dash command line parameters and the given GAMS command line parameters, e.g. %gams.input% will be substituted in the YAML file. The list of parameters available for substitution is printed to the GAMS log at the beginning of the job in the section GAMS Parameters defined.

When Connect is initiated via the shell command gamsconnect all substitutions need to be specified on the command line:

gamsconnect myci.yaml key1=val1 key2=val2 ...


key can be just MYFILENAME or be composed like gams.Input or system.dirSep.

# Use Connect Agents in Custom Python Code

Instead of passing instructions via one of the Connect interfaces, users can execute tasks directly in their Python code by creating an instance of ConnectDatabase and calling method .exec_task(task). The task argument is expected to be a Python dictionary of form:

{
'<agent name>':
{
'<global option1>': <value>,
'<global option2>': <value>,
... ,
'symbols':
[
{
'<symbol1 option1>': <value>,
'<symbol1 option2>': <value>,
...
},
{
'<symbol2 option1>': <value>,
'<symbol2 option2>': <value>,
...
},
...
]
}
}


Users can either construct the Python dictionary themselves or let YAML create the dictionary from a YAML script. The following example creates an instance of ConnectDatabase and executes two tasks: First, the CSV file stockprice.csv is read into the Connect database and second, the symbol stockprice is written to the GAMS database. In this example, the tasks are directly specified as Python dictionaries.

Set dates, stocks;
Parameter stockprice(dates<,stocks<);

$onEcho > stockprice.csv date;symbol;price 2016/01/04;AAPL;105,35 2016/01/04;AXP;67,59 2016/01/04;BA;140,50$offEcho

$onEmbeddedCode Python: from gams_connect.connectdatabase import ConnectDatabase cdb = ConnectDatabase(gams._system_directory, gams) cdb.exec_task({'CSVReader': {'file': 'stockprice.csv', 'name': 'stockprice', 'indexColumns': [1, 2], 'valueColumns': [3], 'fieldSeparator': ';', 'decimalSeparator': ','}}) cdb.exec_task({'GAMSWriter': {'symbols': [{'name': 'stockprice'}]}})$offEmbeddedCode

display stockprice;


We can also construct the Python dictionaries by using YAML:

Set dates, stocks;
Parameter stockprice(dates<,stocks<);

$onEcho > stockprice.csv date;symbol;price 2016/01/04;AAPL;105,35 2016/01/04;AXP;67,59 2016/01/04;BA;140,50$offEcho

$onEmbeddedCode Python: import yaml from gams_connect.connectdatabase import ConnectDatabase cdb = ConnectDatabase(gams._system_directory, gams) inst = yaml.safe_load(''' - CSVReader: file: stockprice.csv name: stockprice indexColumns: [1, 2] valueColumns: [3] fieldSeparator: ';' decimalSeparator: ',' - GAMSWriter: symbols: - name: stockprice ''') for task in inst: cdb.exec_Task(task)$offEmbeddedCode

display stockprice;


Here YAML creates a list of dictionaries (i.e. a list of tasks) from the given YAML script.

# Command Line Utility gamsconnect

The GAMS system directory contains the utility gamsconnect to run Connect instructions directly from the command line. On Windows the utility has the callable extension .cmd. This script wraps the Python script connectdriver.py by calling the Python interpreter that ships with GAMS. gamsconnect operates as the other Connect drivers on a YAML instruction file. The agents GAMSReader and GAMSWriter are not available from gamsconnect and will trigger an exception. Substitutions can be passed to gamsconnect via command line arguments as key=value, e.g. filename=myfile.csv and even gams.scrdir=/tmp/. gamsconnect is called like this:

gamsconnect <YAMLFile> [key1=value1 [key2=value2 [key3=value3 [...]]]]

GAMS Development Corp.
GAMS Software GmbH

General Information and Sales
U.S. (+1) 202 342-0180
Europe: (+49) 221 949-9170