Validating Data

transfer requires that the records for all symbols exist in a standard format (Standard Data Formats) in order for them to be understood by the Container. It is certainly possible that the data could end up in a state that is inconsistent with the standard format (especially if setting symbol attributes directly). transfer includes the .isValid() method in order to determine if a symbol is structurally valid – this method returns a bool. This method does not guarantee that a symbol will be successfully written to either GDX or GMD, other data errors (duplicate records, long UEL names, or domain violations) could exist that are not tested in .isValid().

For example, we create two valid sets and then check them with .isValid() to be sure.

Note: It is possible to run .isValid() on both the Container as well as the symbol object – .isValid() will also return a bool if there are any invalid symbols in the Container object.

Example (valid data)

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["seattle", "san-diego", "washington_dc"])
j = gt.Set(m, "j", i, records=["san-diego", "washington_dc"])

In [1]: i.isValid()
Out[1]: True
 
In [2]: j.isValid()
Out[2]: True
 
In [3]: m.isValid()
Out[3]: True

The .isValid() method checks:

If the symbol belongs to a Container
If all domain set symbols exist in the Container
If all domain set symbols objects are valid
If any domain set is also a singleton set (not allowed in GAMS)
If records are a DataFrame (or None)
If the records DataFrame is the right number of columns (based on symbol dimension)
If the symbol is a scalar, then ensure there is only one record (row) in the DataFrame
If records column headings are unique
If any symbol attribute columns are missing or out of order
If all domain columns are type category
If all domain categories are type str
If all data columns are type float

Custom Column Headings

The names of the domain columns are flexible, but transfer requires unique column names. Users are encouraged to change the column headings of the underlying dataframe by using the domain_labels property. Using this property will ensure that unique column names are generated by adding a _<dimension> tag to the end of any user supplied column names. The following examples show this behavior.

Attention: All * domains are recast as uni. This allows users to access the column data with both the Pandas bracket and/or dot notation (i.e., df["uni"] or df.uni).

Column heading behavior at symbol instantiation

The setRecords (which is called internally at symbol instantiation) method will set default domain_labels if they were not provided by the user. The only way for a user to provide domain labels with setRecords is by passing in a Pandas DataFrame object. The _<dimension> tag will be added to all domain labels in order to make all domain names unique – this tag is added to all dimensions if any subset of the domain names are non-unique.

import gams.transfer as gt
import pandas as pd
 
m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
 
# define a symbol with unique domain names
a = gt.Parameter(
    m, "a", [i, "*"], records=[("i1", "u1", 1), ("i2", "u2", 1), ("i3", "u3", 1)]
)
 
# define a symbol with NON-unique domain names
b = gt.Parameter(
    m, "b", [i, i], records=[("i1", "i1", 1), ("i2", "i2", 1), ("i3", "i3", 1)]
)
 
# define a symbol from a dataframe that already has column names
df = pd.DataFrame(
    [("i1", "i1", 1), ("i2", "i2", 1), ("i3", "i3", 1)],
    columns=["from", "to", "distance"],
)
c = gt.Parameter(m, "c", [i, i], records=df)
 
 
In [1]: m.isValid()
Out[1]: True
 
In [2]: i.records
Out[2]:
  uni element_text
0  i1
1  i2
2  i3
 
In [3]: a.records
Out[3]:
    i uni  value
0  i1  u1    1.0
1  i2  u2    1.0
2  i3  u3    1.0
 
In [4]: b.records
Out[4]:
  i_0 i_1  value
0  i1  i1    1.0
1  i2  i2    1.0
2  i3  i3    1.0
 
In [5]: c.records
Out[5]:
  from  to  value
0   i1  i1    1.0
1   i2  i2    1.0
2   i3  i3    1.0

Customizing column headings

Many users may want to output the GAMS DataFrame directly to another format (CSV, etc.) and may wish to create customized DataFrame column headings for readability. User can do this by directly setting the domain_labels property, as seen in the following example.

Attention: Users are encouraged to use the <symbol>.domain_labels property instead of setting the <DataFrame>.columns directly. The avoids the possibility of out-of-sync symbol validity. The domain_labels property does not store anything, calling this property simply returns the exact domain labels from the DataFrame.

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
 
# define a symbol with unique domain names
a = gt.Parameter(
    m, "a", [i, "*"], records=[("i1", "u1", 1), ("i2", "u2", 1), ("i3", "u3", 1)]
)
 
# customize the headings to allow more friendly output (to csv, etc.)
a.domain_labels = ["start", "destination"]
 
In [1]: m.isValid()
Out[1]: True
 
In [2]: a.records
Out[2]:
  start destination  value
0    i1          u1    1.0
1    i2          u2    1.0
2    i3          u3    1.0

Converting Records

All data in transfer will be stored as a Pandas DataFrame – however, it is desirable to have easy access to data without the additional infrastructure that comes with the DataFrame object. We include to* methods (available for all symbol types) that will return other data structures. The following examples show the behavior of toValue, toList, and toDict (previous examples showed examples of toDense and toSparseCoo).

Examples of toList

import gams.transfer as gt
import pandas as pd
 
m = gt.Container()
i = gt.Set(m, "i", records=["i0", "i1", "i2"])
ii = gt.Set(
    m, "ii", ["*", "*"], records=[(f"i{i}", f"i{i}", f"element_text for i{i}") for i in range(3)]
)
s = gt.Set(m, "s", i, is_singleton=True, records="i1")
u = gt.UniverseAlias(m, "u")
ip = gt.Alias(m, "ip", i)
a0 = gt.Parameter(m, "a0", records=1)
a1 = gt.Parameter(m, "a1", i, records=[("i1", 1), ("i2", 2)])
a2 = gt.Parameter(m, "a2", [i, i], records=[("i1", "i1", 1), ("i2", "i1", 2)])
v0 = gt.Variable(m, "v0", "free", records=1)
v1 = gt.Variable(
    m,
    "v1",
    "free",
    i,
    records=pd.DataFrame([("i1", 1), ("i2", 2)], columns=["i", "level"]),
)
v2 = gt.Variable(
    m,
    "v2",
    "free",
    [i, i],
    records=(
        pd.DataFrame(
            [("i1", "i1", 1), ("i2", "i1", 2)],
            columns=["i", "i", "level"],
        )
    ),
)
 
In [1]: i.toList()
Out[1]: ['i0', 'i1', 'i2']
 
In [2]: ii.toList()
Out[2]: [('i0', 'i0'), ('i1', 'i1'), ('i2', 'i2')]
 
In [3]: ii.toList(include_element_text=True)
Out[3]:
[('i0', 'i0', 'element_text for i0'),
 ('i1', 'i1', 'element_text for i1'),
 ('i2', 'i2', 'element_text for i2')]
 
In [4]: s.toList()
Out[4]: ['i1']
 
In [5]: u.toList()
Out[5]: ['i0', 'i1', 'i2']
 
In [6]: ip.toList()
Out[6]: ['i0', 'i1', 'i2']
 
In [7]: a0.toList()
Out[7]: [1.0]
 
In [8]: a1.toList()
Out[8]: [('i1', 1.0), ('i2', 2.0)]
 
In [9]: a2.toList()
Out[9]: [('i1', 'i1', 1.0), ('i2', 'i1', 2.0)]
 
In [10]: v0.toList() # default is to include only the "level"
Out[10]: [1.0]
 
In [11]: v0.toList("marginal")
Out[11]: [0.0]
 
In [12]: v1.toList() # default is to include only the "level"
Out[12]: [('i1', 1.0), ('i2', 2.0)]
 
In [13]: v1.toList(["level", "marginal"])
Out[13]: [('i1', 1.0, 0.0), ('i2', 2.0, 0.0)]

Examples of toValue

In [1]: a0.toValue()
Out[1]: 1.0
 
In [2]: a1.toValue()
Out[2]: TypeError: Cannot extract value data for non-scalar symbols (symbol dimension is 1)
 
In [3]: v0.toValue() # default is to only include the "level"
Out[3]: 1.0
 
In [4]: v0.toValue("marginal")
Out[4]: 0.0

Examples of toDict

In [1]: i.toDict()
Out[1]: AttributeError: 'Set' object has no attribute 'toDict'
 
In [2]: a0.toDict()
Out[2]: TypeError: Symbol `a0` is a scalar and cannot be converted into a dict.
 
In [3]: a1.toDict()
Out[3]: {'i1': 1.0, 'i2': 2.0}
 
In [4]: a2.toDict()
Out[4]: {('i1', 'i1'): 1.0, ('i2', 'i1'): 2.0}
 
In [5]: v1.toDict() # default is to only include the "level"
Out[5]: {'i1': 1.0, 'i2': 2.0}
 
In [6]: v1.toDict(["level","marginal"])
Out[6]: {'i1': {'level': 1.0, 'marginal': 0.0}, 'i2': {'level': 2.0, 'marginal': 0.0}}
 
In [7]: v1.toDict(orient="columns") # this format is useful for recreating Pandas DataFrames
Out[7]: {'i': {0: 'i1', 1: 'i2'}, 'level': {0: 1.0, 1: 2.0}}

Comparing Symbols

Sparse GAMS data is inherently unordered. The concept of order is GAMS is governed by the order of the UELs in the universe set not the order of the records. This differs from the sparse data structures that we use in transfer (Pandas DataFrames) because each record (i.e., DataFrame row) has an index (typically 0..n) and can be sorted by this index. Said a slightly different way, two GDX files will be equivalent if their universe order is the same and the records are the same, however when creating the GDX file, it is of no consequence what order the records are written in. Therefore, in order to calculate an equality between two symbols in transfer we must perform a merge operation on the symbol domain labels – an operation that could be computationally expensive for large symbols.

Attention: The nature of symbol equality in transfer means that a potentially expensive merge operation is performed, we do not recommend that the equals method be used inside loops or when speed is critical. It is, however, very useful for data debugging.

A quick example shows the syntax of equals:

m = gt.Container()
i = gt.Set(m, "i", records=[f"i{i}" for i in range(5)], description="set i")
j = gt.Set(m, "j", records=[f"i{i}" for i in range(5)], description="set j")

In [1]: i.equals(j)

Out[1]: False

By default, equals takes the strictest view of symbol "equality" – everything must be the same. In this case, the symbol names and descriptions differ between the two sets i and j. We can relax the view of equality with a combination of argument flags. Comparing the two symbols again, but ignoring the meta data (i.e., ignoring the symbol name, description and type (if a Variable or Equation)):

In [1]: i.equals(j, check_meta_data=False)

Out[1]: True

It is also possible to ignore the set element text in equals:

m = gt.Container()
i = gt.Set(m, "i", records=[(f"i{i}", "arlington") for i in range(5)])
j = gt.Set(m, "j", records=[f"i{i}" for i in range(5)])

In [1]: i.records
Out[1]:
  uni   element_text
0    i0    arlington
1    i1    arlington
2    i2    arlington
3    i3    arlington
4    i4    arlington
 
In [2]: j.records
Out[2]:
  uni   element_text
0    i0
1    i1
2    i2
3    i3
4    i4
 
In [3]: i.equals(j, check_meta_data=False, check_element_text=False)
Out[3]: True

The check_uels argument will ensure that the symbol "universe" is the same (in order and content) between two symbols, as illustrated in the following example:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
ip = gt.Set(m, "ip", records=["i3", "i2", "i1"])

Clearly, the two sets i and ip have the same records, but the UEL order is different. If check_uels=True the resulting symbols will not be considered equal – turning this flag off results in equality.

In [1]: i.getUELs()
Out[1]: ['i1', 'i2', 'i3']
 
In [2]: ip.getUELs()
Out[2]: ['i3', 'i2', 'i1']
 
In [3]: i.equals(ip, check_meta_data=False)
Out[3]: False
 
In [4]: i.equals(ip, check_meta_data=False, check_uels=False)
Out[4]: True

Numerical comparisons are enabled for Parameters, Variables and Equations – equality can be flexibly defined through the equals method arguments. Again, the strictest view of equality is taken as the default behavior of equals (no numerical tolerances, some limitations exist – see: numpy.isclose for more details).

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
a = gt.Parameter(m, "a", i, records=[("i1", 1), ("i2", 2), ("i3", 3)])
ap = gt.Parameter(m, "ap", i, records=[("i1", 1 + 1e-9), ("i2", 2), ("i3", 3)])

In [1]: a.equals(ap, check_meta_data=False)
Out[1]: False
 
In [2]: a.equals(ap, check_meta_data=False, atol=1e-8)
Out[2]: True

Attention: The numerical comparison is handled by numpy.isclose, more details can be found in the Numpy documentation.

In the case of variables and equations, it is possible for the user to confine the numerical comparison to certain certain attributes (level, marginal, lower, upper and scale) by specifying the columns argument as the following example illustrates:

m = gt.Container()
a = gt.Variable(m, "a", "free", records=100)
ap = gt.Variable(m, "ap", "free", records=101)

In [1]: a.records
Out[1]:
   level  marginal  lower  upper  scale
0  100.0       0.0   -inf    inf    1.0
 
In [2]: ap.records
Out[2]:
   level  marginal  lower  upper  scale
0  101.0       0.0   -inf    inf    1.0
 
In [3]: a.equals(ap, check_meta_data=False)
Out[3]: False
 
In [4]: a.equals(ap, check_meta_data=False, columns="level")
Out[4]: False
 
In [5]: a.equals(ap, check_meta_data=False, columns="marginal")
Out[5]: True

Domain Forwarding

GAMS includes the ability to define sets directly from data using the implicit set notation (see: Implicit Set Definition (or: Domain Defining Symbol Declarations)). This notation has an analogue in transfer called domain_forwarding.

Note: It is possible to recursively update a subset tree in transfer.

Domain forwarding is available as an argument to all symbol object constructors; the user would simply need to pass domain_forwarding=True.

In this example we have raw data that in the dist DataFrame and we want to send the domain information into the i and j sets – we take care to pass the set objects as the domain for parameter c.

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i")
j = gt.Set(m, "j")
 
dist = pd.DataFrame(
    [
        ("seattle", "new-york", 2.5),
        ("seattle", "chicago", 1.7),
        ("seattle", "topeka", 1.8),
        ("san-diego", "new-york", 2.5),
        ("san-diego", "chicago", 1.8),
        ("san-diego", "topeka", 1.4),
    ],
    columns=["from", "to", "thousand_miles"],
)
 
c = gt.Parameter(m, "c", [i, j], records=dist, domain_forwarding=True)

In [1]: i.records
Out[1]:
       uni   element_text
0    seattle
1  san-diego
 
In [2]: j.records
Out[2]:
      uni   element_text
0  new-york
1   chicago
2    topeka
 
In [3]: c.records
Out[3]:
         i         j    value
0    seattle  new-york    2.5
1    seattle   chicago    1.7
2    seattle    topeka    1.8
3  san-diego  new-york    2.5
4  san-diego   chicago    1.8
5  san-diego    topeka    1.4

Note: The element order in the sets i and j mirrors that in the raw data.

In this example we show that domain forwarding will also work recursively to update the entire set lineage – the domain forwarding occurs at the creation of every symbol object. The correct order of elements in set i is [z, a, b, c] because the records from j are forwarded first, and then the records from k are propagated through (back to i).

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i")
j = gt.Set(m, "j", i, records=["z"], domain_forwarding=True)
k = gt.Set(m, "k", j, records=["a", "b", "c"], domain_forwarding=True)

In [1]: i.records
Out[1]:
  uni   element_text
0     z
1     a
2     b
3     c
 
In [2]: j.records
Out[2]:
  i   element_text
0   z
1   a
2   b
3   c
 
In [3]: k.records
Out[3]:
  j   element_text
0   a
1   b
2   c

It is also possible to forward to specific domain sets by passing a list of bool to the domain_forwarding property, as seen in the following example:

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i")
j = gt.Set(m, "j")
k = gt.Set(m, "k")
 
ijk = gt.Parameter(
    m,
    "ijk",
    [i, j, k],
    records=[("i", "j", "k", 1)],
domain_forwarding=[True, False, True],
)
 
In [1]: i.records
Out[1]:
  uni element_text
0   i
 
In [2]: j.records is None
Out[2]: True
 
In [3]: k.records
Out[3]:
  uni element_text
0   k

Domain Violations

Domain violations occur when domain labels appear in symbol data but they do not appear in the parent set which the symbol is defined over – attempting to execute a GAMS model when there domain violations will lead to compilation errors. Domain violations are found dynamically with the <Symbol>.findDomainViolations() method.

Note: the findDomainViolations method can be computationally expensive – UELs in GAMS are case preserving (just like symbol names); additionally, GAMS ignores all trailing white space in UELs (leading white space is considered significant). As a result, transfer must lowercase all UELs and then strip any trailing white space before doing the set comparison to locate (and create) any DomainViolation objects. findDomainViolations should not be used in a loop (nor should any of its related methods: hasDomainViolations, countDomainViolations, getDomainViolations, or dropDomainViolations).

In the following example we intentionally create data with domain violations in the a parameter:

m = gt.Container()
i = gt.Set(m, "i", records=["a", "b", "c"])
a = gt.Parameter(m, "a", i, records=[("aa", 1), ("c", 2)])

In [1]: a.findDomainViolations()
Out[1]:
  i    value
0  aa    1.0
 
In [2]: a.hasDomainViolations()
Out[2]: True
 
In [3]: a.countDomainViolations()
Out[3]: 1
 
In [4]: a.getDomainViolations()
Out[4]: [<DomainViolation at 0x7fb6b83d9630>]

Dynamically locating domain violations allows transfer to return a view of the underlying pandas dataframe with the problematic domain labels still intact – at this point the user is free to correct issues in the UELs with any of the *UELs methods or by simply dropping any domain violations from the dataframe completely (the dropDomainViolations method is a convenience function for this operation).

Attention: It is not possible to create a GDX file if symbols have domain violations.; Unused UELs will not result in domain violations.

Attempting to write this container to a GDX file will result in an exception.

m = gt.Container()
i = gt.Set(m, "i", records=["a", "b", "c"])
a = gt.Parameter(m, "a", i, records=[("aa", 1), ("c", 2)])
m.write("out.gdx")

Exception: Encountered data errors with symbol `a`. Possible causes are from duplicate records and/or domain violations.
 
Use 'hasDuplicateRecords', 'findDuplicateRecords', 'dropDuplicateRecords', and/or 'countDuplicateRecords' to find/resolve duplicate records.
Use 'hasDomainViolations', 'findDomainViolations', 'dropDomainViolations', and/or 'countDomainViolations' to find/resolve domain violations.
 
GDX file was not created successfully.

Duplicate Records

Duplicate records can easily appear in large datasets – locating and fixing these records is straightforward with transfer. transfer includes find*, has*, count* and drop* methods for duplicate records, just as it has for domain violations.

Note: the findDuplicateRecords method can be computationally expensive – UELs in GAMS are case preserving (just like symbol names); additionally, GAMS ignores all trailing white space in UELs (leading white space is considered significant). As a result, transfer must lowercase all UELs and then strip any trailing white space before doing the set comparison to locate duplicate records. findDuplicateRecords should not be used in a loop (nor should any of its related methods: hasDuplicateRecords, countDuplicateRecords, or dropDuplicateRecords).

Dynamically locating duplicate records allows transfer to return a view of the underlying pandas dataframe with the problematic domain labels still intact – at this point the user is free to correct issues in the UELs with any of the *UELs methods or by simply dropping any duplicate records from the dataframe completely (the dropDuplicateRecords method is a convenience function for this operation).

m = gt.Container()
a = gt.Parameter(
    m,
    "a",
    ["*"],
    records=[("i" + str(i), float(i)) for i in range(4)]
    + [("j" + str(i), i) for i in range(4)]
    + [("I" + str(i), i) for i in range(4)],
)

Note: The user can decide which duplicate records they would like keep with keep="first" (default), keep="last", or keep=False (which returns all duplicate records)

In [1]: a.records
Out[1]:
   uni    value
   i0    0.0
   i1    1.0
   i2    2.0
   i3    3.0
   j0    0.0
   j1    1.0
   j2    2.0
   j3    3.0
   I0    0.0
   I1    1.0
  I2    2.0
  I3    3.0
 
In [2]: a.findDuplicateRecords()
Out[2]:
   uni    value
   I0    0.0
   I1    1.0
  I2    2.0
  I3    3.0
 
In [3]: a.findDuplicateRecords(keep="last")
Out[3]:
  uni    value
  i0    0.0
  i1    1.0
  i2    2.0
  i3    3.0
 
In [4]: a.findDuplicateRecords(keep=False)
Out[4]:
   uni    value
   i0    0.0
   i1    1.0
   i2    2.0
   i3    3.0
   I0    0.0
   I1    1.0
  I2    2.0
  I3    3.0

Attention: It is not possible to create a GDX file if symbols have duplicate records.

Attempting to write this container to a GDX file will result in an exception.

m = gt.Container()
a = gt.Parameter(
    m,
    "a",
    ["*"],
    records=[("i" + str(i), float(i)) for i in range(4)]
    + [("j" + str(i), i) for i in range(4)]
    + [("I" + str(i), i) for i in range(4)],
)
m.write("out.gdx")

Exception: Encountered data errors with symbol `a`. Possible causes are from duplicate records and/or domain violations.
 
Use 'hasDuplicateRecords', 'findDuplicateRecords', 'dropDuplicateRecords', and/or 'countDuplicateRecords' to find/resolve duplicate records.
Use 'hasDomainViolations', 'findDomainViolations', 'dropDomainViolations', and/or 'countDomainViolations' to find/resolve domain violations.
 
GDX file was not created successfully.

Pivoting Data

It might be convenient to pivot data into a multi-dimensional data structure rather than maintaining the flat structure in records. A convenience method called pivot is provided for all symbol classes and will return a pivoted pandas.DataFrame. Pivoting is only available for symbols with more than one dimension.

Example #1 - Pivot a 2D Set

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=[f"i{i}" for i in range(5)])
j = gt.Set(m, "j", records=[f"j{i}" for i in range(5)])
ij = gt.Set(m, "ij", [i, j])
ij.generateRecords(density=0.25, seed=123)
 
In [1]: ij.pivot()
Out[1]:
       j0     j1     j3     j4
i0   True   True  False  False
i1   True  False  False  False
i2  False  False   True   True
i4  False   True  False  False

Example #2 - Pivot a 3D Set

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=[f"i{i}" for i in range(5)])
j = gt.Set(m, "j", records=[f"j{i}" for i in range(5)])
iji = gt.Set(m, "iji", [i, j, i])
iji.generateRecords(density=0.25, seed=123)
 
In [1]: iji.pivot()
Out[1]:
          i0     i1     i2     i3     i4
i0 j0  False   True   True  False  False
   j1   True  False  False  False  False
   j3  False  False  False   True  False
   j4  False  False   True  False  False
i1 j0   True   True  False   True  False
   j1   True   True  False  False   True
   j2  False   True  False  False  False
   j4  False  False  False   True  False
i2 j0   True  False  False  False  False
   j1  False  False   True  False   True
   j3   True  False  False  False  False
i3 j2  False   True   True  False   True
   j3  False   True  False  False  False
   j4   True  False   True   True   True
i4 j0  False   True  False   True  False
   j1  False  False  False   True  False
   j3  False  False  False   True  False
   j4  False  False  False   True   True
 
In [2]: iji.pivot(fill_value="")
Out[2]:
         i0    i1    i2    i3    i4
i0 j0        True  True
   j1  True
   j3                    True
   j4              True
i1 j0  True  True        True
   j1  True  True              True
   j2        True
   j4                    True
i2 j0  True
   j1              True        True
   j3  True
i3 j2        True  True        True
   j3        True
   j4  True        True  True  True
i4 j0        True        True
   j1                    True
   j3                    True
   j4                    True  True

Note: When pivoting symbols with >2 dimensions, the first [0..(dimension-1)] dimensions will be set to the index and the last dimension will be pivoted into the columns. This behavior can be customized with the index and columns arguments.

Example #3 - Pivot a 3D Parameter w/ a fill_value

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=[f"i{i}" for i in range(5)])
j = gt.Set(m, "j", records=[f"j{i}" for i in range(5)])
iji = gt.Parameter(m, "iji", [i, j, i])
iji.generateRecords(density=0.05, seed=123)
 
In [1]: iji.pivot(fill_value="NONE")
Out[1]:
             i1        i2        i3        i4
i0 j1  0.682352      NONE      NONE      NONE
   j2  0.053821      NONE   0.22036      NONE
i1 j1      NONE      NONE      NONE  0.184372
i2 j0      NONE  0.175906      NONE      NONE
i3 j4      NONE      NONE  0.812095      NONE
 
In [2]: iji.pivot(fill_value=0)
Out[2]:
             i1        i2        i3        i4
i0 j1  0.682352  0.000000  0.000000  0.000000
   j2  0.053821  0.000000  0.220360  0.000000
i1 j1  0.000000  0.000000  0.000000  0.184372
i2 j0  0.000000  0.175906  0.000000  0.000000
i3 j4  0.000000  0.000000  0.812095  0.000000
 
In [3]: iji.pivot(fill_value=gt.SpecialValues.EPS)
Out[3]:
             i1        i2        i3        i4
i0 j1  0.682352 -0.000000 -0.000000 -0.000000
   j2  0.053821 -0.000000  0.220360 -0.000000
i1 j1 -0.000000 -0.000000 -0.000000  0.184372
i2 j0 -0.000000  0.175906 -0.000000 -0.000000
i3 j4 -0.000000 -0.000000  0.812095 -0.000000

Example #4 - Pivot (only the marginal values) of a 3D Variable

import gams.transfer as gt
 
# NOTE: custom functions should expose a 'seed' argument
def marginal_values(seed, size):
    rng = np.random.default_rng(seed)
    return rng.normal(5, 1.2, size=size)
 
m = gt.Container()
i = gt.Set(m, "i", records=[f"i{i}" for i in range(5)])
j = gt.Set(m, "j", records=[f"j{i}" for i in range(5)])
iji = gt.Variable(m, "iji", "free", [i, j, i])
iji.generateRecords(density=0.05, func={"marginal": marginal_values}, seed=123)
 
In [1]: iji.records
Out[1]:
  i_0 j_1 i_2  level  marginal  lower  upper  scale
0  i0  j1  i1    0.0  3.813054   -inf    inf    1.0
1  i0  j2  i1    0.0  4.558656   -inf    inf    1.0
2  i0  j2  i3    0.0  6.545510   -inf    inf    1.0
3  i1  j1  i4    0.0  5.232769   -inf    inf    1.0
4  i2  j0  i2    0.0  6.104277   -inf    inf    1.0
5  i3  j4  i3    0.0  5.692525   -inf    inf    1.0
 
In [2]: iji.pivot(value="marginal")
Out[2]:
             i1        i3        i4        i2
i0 j1  3.813054  0.000000  0.000000  0.000000
   j2  4.558656  6.545510  0.000000  0.000000
i1 j1  0.000000  0.000000  5.232769  0.000000
i2 j0  0.000000  0.000000  0.000000  6.104277
i3 j4  0.000000  5.692525  0.000000  0.000000

Describing Data

The methods describeSets, describeParameters, describeVariables, and describeEquations allow the user to get a summary view of key data statistics. The returned DataFrame aggregates the output for a number of other methods (depending on symbol type). A description of each Container method is provided in the following subsections:

describeSets

Argument	Type	Description	Required	Default
`symbols`	`list`, `str`, NoneType	A list of sets in the `Container` to include in the output. describeSets will include aliases if they are explicitly passed by the user.	No	`None` (if `None` specified, will assume all sets – not aliases)

Returns: pandas.DataFrame

The following table includes a short description of the column headings in the return.

Property / Statistic	Description
`name`	name of the symbol
`is_singleton`	`bool` if the set/alias is a singleton set (or an alias of a singleton set)
`alias_with`	[OPTIONAL if users passes an alias name as part of `symbols`] name of the parent set (for alias only), None otherwise
`domain`	domain labels for the symbol
`domain_type`	`none`, `relaxed` or `regular` depending on the symbol state
`dimension`	dimension
`number_records`	number of records in the symbol
`sparsity`	`1 - number_records/cardinality`

Example #1

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")

In [1]: m.describeSets()
Out[1]:
  name  is_singleton domain domain_type  dimension  number_records sparsity
0    i         False    [*]        none          1               2     None
1    j         False    [*]        none          1               3     None

Example #2 – with aliases

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["i" + str(i) for i in range(1, 10)])
j = gt.Set(m, "j", records=["j" + str(i) for i in range(1, 10)])
 
ip = gt.Alias(m, "ip", i)
jp = gt.Alias(m, "jp", j)

In [1]: m.describeSets()
Out[1]:
  name  is_singleton domain domain_type  dimension  number_records sparsity
0    i         False    [*]        none          1               9     None
1    j         False    [*]        none          1               9     None
 
In [2]: m.describeSets(m.listSets() + m.listAliases())
Out[2]:
  name  is_singleton  is_alias alias_with domain domain_type  dimension  number_records sparsity
0    i         False     False       None    [*]        none          1               9     None
1   ip         False      True          i    [*]        none          1               9     None
2    j         False     False       None    [*]        none          1               9     None
3   jp         False      True          j    [*]        none          1               9     None

describeParameters

Argument	Type	Description	Required	Default
`symbols`	`list`, `str`, NoneType	A list of parameters in the `Container` to include in the output	No	`None` (if `None` specified, will assume all parameters)

Returns: pandas.DataFrame

The following table includes a short description of the column headings in the return.

Property / Statistic	Description
`name`	name of the symbol
`domain`	domain labels for the symbol
`domain_type`	`none`, `relaxed` or `regular` depending on the symbol state
`dimension`	dimension
`number_records`	number of records in the symbol
`min`	min value in data
`mean`	mean value in data
`max`	max value in data
`where_min`	domain of min value (if multiple, returns only first occurrence)
`where_max`	domain of max value (if multiple, returns only first occurrence)
`sparsity`	`1 - number_records/cardinality`

Example

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")

In [1]: m.describeParameters()
Out[1]:
  name  domain domain_type  dimension  number_records      min     mean      max            where_min            where_max sparsity
0    a     [i]     regular          1               2  350.000  475.000  600.000            [seattle]          [san-diego]      0.0
1    b     [j]     regular          1               3  275.000  300.000  325.000             [topeka]           [new-york]      0.0
2    c  [i, j]     regular          2               6    0.126    0.176    0.225  [san-diego, topeka]  [seattle, new-york]      0.0
3    d  [i, j]     regular          2               6    1.400    1.950    2.500  [san-diego, topeka]  [seattle, new-york]      0.0
4    f      []        none          0               1   90.000   90.000   90.000                 None                 None     None

describeVariables

Argument	Type	Description	Required	Default
`symbols`	`list`, `str`, NoneType	A list of variables in the `Container` to include in the output	No	`None` (if `None` specified, will assume all variables)

Returns: pandas.DataFrame

The following table includes a short description of the column headings in the return.

Property / Statistic	Description
`name`	name of the symbol
`type`	type of variable (i.e., `binary`, `integer`, `positive`, `negative`, `free`, `sos1`, `sos2`, `semicont`, `semiint`)
`domain`	domain labels for the symbol
`domain_type`	`none`, `relaxed` or `regular` depending on the symbol state
`dimension`	dimension
`number_records`	number of records in the symbol
`sparsity`	`1 - number_records/cardinality`
`min_level`	min value in the `level`
`mean_level`	mean value in the `level`
`max_level`	max value in the `level`
`where_max_abs_level`	domain of max(abs(`level`)) in data

Example

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")

In [1]: m.describeVariables()
Out[1]:
  name      type  domain domain_type  dimension  number_records sparsity  min_level  mean_level  max_level where_max_abs_level
0    x  positive  [i, j]     regular          2               6      0.0      0.000     150.000    300.000  [seattle, chicago]
1    z      free      []        none          0               1     None    153.675     153.675    153.675                None

describeEquations

Argument	Type	Description	Required	Default
`symbols`	`list`, `str`, NoneType	A list of equations in the `Container` to include in the output	No	`None` (if `None` specified, will assume all equations)

Returns: pandas.DataFrame

The following table includes a short description of the column headings in the return.

Property / Statistic	Description
`name`	name of the symbol
`type`	type of variable (i.e., `binary`, `integer`, `positive`, `negative`, `free`, `sos1`, `sos2`, `semicont`, `semiint`)
`domain`	domain labels for the symbol
`domain_type`	`none`, `relaxed` or `regular` depending on the symbol state
`dimension`	dimension
`number_records`	number of records in the symbol
`sparsity`	`1 - number_records/cardinality`
`min_level`	min value in the `level`
`mean_level`	mean value in the `level`
`max_level`	max value in the `level`
`where_max_abs_level`	domain of max(abs(`level`)) in data

Example

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")

In [1]: m.describeEquations()
Out[1]:
     name type domain domain_type  dimension  number_records sparsity  min_level  mean_level  max_level where_max_abs_level
0    cost   eq     []        none          0               1     None       -0.0         0.0       -0.0                None
1  demand  geq    [j]     regular          1               3      0.0      275.0       300.0      325.0          [new-york]
2  supply  leq    [i]     regular          1               2      0.0      350.0       450.0      550.0         [san-diego]

describeAliases

Argument	Type	Description	Required	Default
`symbols`	`list`, `str`, NoneType	A list of alias (only) symbols in the `Container` to include in the output	No	`None` (if `None` specified, will assume all aliases – not sets)

Returns: pandas.DataFrame

The following table includes a short description of the column headings in the return. All data is referenced from the parent set that the alias is created from.

Property / Statistic	Description
`name`	name of the symbol
`alias_with`	name of the parent set (for alias only), None otherwise
`is_singleton`	`bool` if the set/alias is a singleton set (or an alias of a singleton set)
`domain`	domain labels for the symbol
`domain_type`	`none`, `relaxed` or `regular` depending on the symbol state
`dimension`	dimension
`number_records`	number of records in the symbol
`sparsity`	`1 - number_records/cardinality`

Example

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["i" + str(i) for i in range(5)])
j = gt.Set(m, "j", records=["j" + str(j) for j in range(10)])
 
ip = gt.Alias(m, "ip", i)
ipp = gt.Alias(m, "ipp", ip)
jp = gt.Alias(m, "jp", j)

In [1]: m.describeAliases()
Out[1]:
  name alias_with  is_singleton domain domain_type  dimension  number_records sparsity
0   ip          i         False    [*]        none          1               5     None
1  ipp          i         False    [*]        none          1               5     None
2   jp          j         False    [*]        none          1              10     None

Matrix Generation

transfer stores data in a "flat" format, that is, one record entry per DataFrame row. However, it is often necessary to convert this data format into a matrix format – transfer enables users to do this with relative ease using the toDense and the toSparseCoo symbol methods. The toDense method will return a dense N-dimensional numpy array with each dimension corresponding to the GAMS symbol dimension; it is possible to output an array up to 20 dimensions (a GAMS limit). The toSparseCoo method will return the data in a sparse scipy COOrdinate format, which can then be efficiently converted into other sparse matrix formats.

Attention: Both the toDense and toSparseCoo methods do not transform the underlying DataFrame in any way, they only return the transformed data.

Note: toSparseCoo will only convert 2-dimensional data to the scipy COOrdinate format. A user interested in sparse data for an N-dimensional symbol will need to decide how to reshape the dense array in order to generate the 2D sparse format.

Attention: In order to use the toSparseCoo method the user will need to install the scipy package. Scipy is not provided with GMSPython.

Both the toDense and toSparseCoo method leverage the indexing that comes along with using categorical data types to store domain information. This means that linking symbols together (by passing symbol objects as domain information) impacts the size of the matrix. This is best demonstrated by a few examples.

Example (1D data w/o domain linking (i.e., a relaxed domain))

import gams.transfer as gt
 
m = gt.Container()
a = gt.Parameter(m, "a", "i", records=[("a", 1), ("c", 3)])

In [1]: a.records
Out[1]:
  i    value
0   a    1.0
1   c    3.0
 
In [2]: a.toDense()
Out[2]: array([1., 3.])
 
In [3]: a.toSparseCoo()
Out[3]:
<1x2 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>

Note that the parameter a is not linked to another symbol, so when converting to a matrix, the indexing is referenced to the data structure in a.records. Defining a sparse parameter a over a set i allows us to extract information from the i domain and construct a very different dense matrix, as the following example shows:

Example (1D data w/ domain linking (i.e., a regular domain))

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["a", "b", "c", "d"])
a = gt.Parameter(m, "a", i, records=[("a", 1), ("c", 3)])

In [1]: i.records
Out[1]:
  uni   element_text
0     a
1     b
2     c
3     d
 
In [2]: a.records
Out[2]:
  i    value
0   a    1.0
1   c    3.0
 
 
In [3]: a.toDense()
Out[3]: array([1., 0., 3., 0.])
 
In [4]: a.toSparseCoo()
Out[4]:
<1x4 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>

Example (2D data w/ domain linking)

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["a", "b", "c", "d"])
a = gt.Parameter(m, "a", [i, i], records=[("a", "a", 1), ("c", "c", 3)])

In [1]: i.records
Out[1]:
  uni   element_text
0     a
1     b
2     c
3     d
 
In [2]: a.records
Out[2]:
  i_0 i_1  value
0   a   a    1.0
1   c   c    3.0
 
 
In [3]: a.toDense()
Out[3]:
array([[1., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 3., 0.],
       [0., 0., 0., 0.]])
 
In [4]: a.toSparseCoo()
Out[4]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in COOrdinate format>

The Universe Set

A Unique Element (UEL) is an (i,s) pair where i (or index) is an identification number for a (string) label s. GAMS uses UELs to efficiently store domain entries of a record by storing the UEL ID i of a domain entry instead of the actual string s. This avoids storing the same string multiple times. The concept of UELs also exists in Python/Pandas and is called a "categorical series". transfer leverages these types in order to efficiently store strings and enable domain checking within the Python environment.

Each domain column in a DataFrame can be assigned a unique categorical type, the effect is that each symbol maintains its own list of UELs per dimension. It is possible to convert a categorical column to its ID number representation by using the categorical accessor x.records[<domain_column_label>].cat.codes; however, this type of data manipulation is not necessary within transfer, but could be handy when debugging data.

Pandas offers the possibility to create categorical column types that are ordered or not; transfer relies exclusively on ordered categorical data types (in order for a symbol to be valid it must have only ordered categories). By using ordered categories, transfer will order the UEL such that elements appear in the order in which they appeared in the data (which is how GAMS defines the UEL). transfer allows the user to reorder the UELs with the uel_priority argument in the .write() method.

transfer does not actually keep track of the UEL separately from other symbols in the Container, it will be created internal to the .write() method and is based on the order in which data is added to the container. The user can access the current state of the UEL with the .getUELs() container method. For example, we set a two dimensional set:

import gams.transfer as gt
 
m = gt.Container()
j = gt.Set(m, "j", ["*", "*"], records=[("i" + str(n), "j" + str(n)) for n in range(2)])

In [1]: j.records
Out[1]:
  uni_0 uni_1 element_text
0    i0    j0
1    i1    j1
 
In [2]: m.getUELs()
Out[2]: ['i0', 'i1', 'j0', 'j1']

Pandas also includes a number of methods that allow categories to be renamed, appended, etc. These methods may be useful for advanced users, but most users will probably find that modifying the original data structures and resetting the symbol records provides a simpler solution. The design of transfer should enable the user to quickly move data back and forth, without worrying about the deeper mechanics of categorical data.

Customize the Universe Set

The concept of a universe set is fundamental to GAMS and has consequences in many areas of GAMS programming including the order of loop execution. For example:

set final_model_year / 2030 /;
set t "all model years" / 2022*2030 /;

singleton set my(t) "model solve year";


loop(t,
  my(t) = yes;
  display my;
  );

The loop will execute model solve year 2030 first because the UEL 2030 was defined in the set final_model_year before it was used again in the definition of set t. This could lead to some surprising behavior if model time periods are linked together. Many GAMS users would create a dummy set (perhaps the first line of their model file) that contained all the UELs that had a significant order tom combat this behavior. transfer allows for full control (renaming as well as ordering) over the universe set through the *UELS methods, briefly described here:

Quick summary table of UELs functions

Method	Brief Description
`addUELs`	Adds UELS to a symbol dimension(s). This function does not have a container level implementation.
`capitalizeUELs`	Capitalize all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`casefoldUELs`	Casefold all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`getUELs`	Gets the UELs in a over either a symbol dimension, the entire symbol or the entire container. Unused UELs do not show up in symbol data but will show up in the GAMS UEL list.
`ljustUELs`	Left justify all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`lowerUELs`	Lowercase all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`lstripUELs`	Left strip whitespace from all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`removeUELs`	Removes UELs from a symbol dimension, the entire symbol, the entire container (or just a subset of symbols). If a used UEL is removed the DataFrame record will show a `NaN`.
`renameUELs`	Renames UELs in a symbol dimension, the entire symbol, the entire container (or just a subset of symbols). Very handy for harmonizing UEL labeling of data that might have originated from different sources.
`reorderUELs`	Reorders UELs in a symbol dimension(s). This function does not have a container level implementation.
`rjustUELs`	Right justify all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`rstripUELs`	Right strip whitespace from all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`setUELs`	Sets UELs for a symbol dimension(s). Equivalent results could be obtained with a combination of `renameUELs` and `reorderUELs`, but this one call may have some performance advantage.
`stripUELs`	Strip whitespace from all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`titleUELs`	Title (capitalize all individual words) in all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations
`upperUELs`	Uppercase all UELs in the symbol or a subset of specified `dimensions`, can be chained with other `*UELs` string operations

These tools are extremely useful when data is arriving at a model from a variety of data sources. We will describe each of these functions in detail and provide examples in the following sections.

Attention: GAMS is insensitive to trailing whitespaces, the *UELs methods will automatically trim any trailing whitespace when creating the new UELs.

getUELs Examples

getUELs is a method of all GAMS symbol classes as well as the Container class. This allows the user to retrieve (ordered) UELs from the entire container or just a specific symbol dimension. For example:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
j = gt.Set(m, "j", i, records=["j1", "j2", "j3"])
a = gt.Parameter(m, "a", [i, j], records=[(f"i{i}", f"j{i}", i) for i in range(4)])

In [1]: i.getUELs()
Out[1]: ['i1', 'i2', 'i3']
 
In [2]: m.getUELs()
Out[2]: ['i1', 'i2', 'i3', 'j1', 'j2', 'j3', 'i0', 'j0']
 
In [3]: m.getUELs("j")
Out[3]: ['j1', 'j2', 'j3']

addUELs Examples

addUELs is a method of all GAMS symbol classes. This method allows the user to add in new UELs labels to a specific dimension of a symbol – the user can add UELs that do not exist in the symbol records. For example:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
j = gt.Set(m, "j", i, records=["j1", "j2", "j3"])
a = gt.Parameter(m, "a", [i, j], records=[(f"i{i}", f"j{i}", i) for i in range(1,4)])
 
i.addUELs("ham")
a.addUELs("and", 0)
a.addUELs("cheese", 1)

In [1]: i.getUELs()
Out[1]: ['i1', 'i2', 'i3', 'ham']
 
 
In [2]: a.getUELs()
Out[2]: ['i1', 'i2', 'i3', 'and', 'j1', 'j2', 'j3', 'cheese']

In this example we have added three new (unused) UELs: ham, and, cheese. These three UELs will now appear in the GAMS universe set (accessible with m.getUELs()). The addition of unused UELs does not impact the validity of the symbols (i.e., unused UELs will not trigger domain violations).

removeUELs Examples

removeUELs is a method of all GAMS symbol classes as well as the Container class. As a result, this method allows the user to clean up unwanted or simply unused UELs in a symbol dimension(s), over several symbols, or over the entire container. The previous example added three unused UELs (ham, and, cheese), but now we want to remove these UELs in order to clean up the GAMS universe set. We can accomplish this several ways:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
j = gt.Set(m, "j", i, records=["j1", "j2", "j3"])
a = gt.Parameter(m, "a", [i, j], records=[(f"i{i}", f"j{i}", i) for i in range(1,4)])
 
i.addUELs("ham")
a.addUELs("and", 0)
a.addUELs("cheese", 1)
 
# remove symbol UELs explicitly by dimension
i.removeUELs("ham", 0)
a.removeUELs("and", 0)
a.removeUELs("cheese", 1)
 
# remove symbol UELs for the entire symbol
i.removeUELs("ham")
a.removeUELs(["and", "cheese"])
 
# remove ONLY unused UELs from each symbol, independently
i.removeUELs()
a.removeUELs()
 
# remove ONLY unused UELs from the entire container (all symbols)
m.removeUELs()

In all cases the resulting universe set will be:

In [1]: m.getUELs()

Out[1]: ['i1', 'i2', 'i3', 'j1', 'j2', 'j3']

If a user removes a UEL that appears in data, that data will be lost permanently. The domain label will be transformed into an NaN as seen in this example:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
j = gt.Set(m, "j", i, records=["j1", "j2", "j3"])
a = gt.Parameter(m, "a", [i, j], records=[(f"i{i}", f"j{i}", i) for i in range(1,4)])
 
m.removeUELs("i1")

In [1]: i.records
Out[1]:
  uni   element_text
0   NaN
1    i2
2    i3
 
In [2]: a.records
Out[2]:
   i   j    value
0  NaN  j1    1.0
1   i2  j2    2.0
2   i3  j3    3.0

Attention: A container cannot be written if there are NaN entries in any of the domain columns (in any symbol) – an Exception is raised if there are missing domain labels.

renameUELs Examples

renameUELs is a method of all GAMS symbol classes as well as the Container class. This method allows the user to rename UELs in a symbol dimension(s), over several symbols, or over the entire container. This particular method is very handy when attempting to harmonize labeling schemes between data structures that originated from different sources. For example:

m = gt.Container()
a = gt.Parameter(
    m,
    "a",
    ["*", "*"],
    records=[("WI", "IL", 10), ("IL", "IN", 12.5), ("WI", "IN", 8.7)],
    description="shipment quantities",
)
b = gt.Parameter(
    m,
    "b",
    ["*"],
    records=[("wisconsin", 1.2), ("illinois", 1.7), ("indiana", 1.2)],
    description="multipliers",
)

...results in the following records:

In [1]: a.records
Out[1]:
  uni_0 uni_1  value
0    WI    IL   10.0
1    IL    IN   12.5
2    WI    IN    8.7
 
In [2]: b.records
Out[2]:
       uni    value
0  wisconsin    1.2
1   illinois    1.7
2    indiana    1.2

However, two different data sources were used to generate the parameters a and b – one data source used the uppercase postal abbreviation of the state name and the other source used a lowercase full state name as the unique identifier. With the following syntax the user would be able to harmonize to a mixed case postal code labeling scheme (without losing any of the original UEL ordering).

m.renameUELs(
    {
        "WI": "Wi",
        "IL": "Il",
        "IN": "In",
        "wisconsin": "Wi",
        "illinois": "Il",
        "indiana": "In",
    }
)

...results in the following records (and the universe set):

In [1]: a.records
Out[1]:
  uni_0 uni_1  value
0    Wi    Il   10.0
1    Il    In   12.5
2    Wi    In    8.7
 
In [2]: b.records
Out[2]:
  uni    value
0    Wi    1.2
1    Il    1.7
2    In    1.2

The universe set will now be:

In [1]: m.getUELs()

Out[1]: ['Wi', 'Il', 'In']

It is possible that some data needs to be cleaned and multiple UELs need to be mapped to a single label (within a single dimension). This is not allowed under default behavior because transfer assumes that the provided UELs are truly unique (logically and lexicographically) – however, it might be necessary recreate the underlying categorical object to combine n (previously unique) UELs into one to establish the necessary logical set links. For example:

m = gt.Container()
a = gt.Parameter(
    m,
    "a",
    ["*", "*"],
    records=[("WISCONSIN", "iowa", 10), ("WI", "illinois", 12)],
)
 
 
In [1]: a.records
Out[1]:
       uni_0     uni_1  value
0  WISCONSIN      iowa   10.0
1         WI  illinois   12.0

The records are unique for a, but logically, there might be a need to rename WI to WISCONSIN.

In [1]: a.renameUELs({"WI": "WISCONSIN"})

Out[1]: Exception: Could not rename UELs (categories) in `a` dimension `0`. Reason: Categorical categories must be unique

In order achieve the desired behavior it is necessary to pass allow_merge=True to renameUELs:

In [1]: a.renameUELs({"WI": "WISCONSIN"}, allow_merge=True)
 
In [2]: a.records
Out[2]:
       uni_0     uni_1  value
0  WISCONSIN      iowa   10.0
1  WISCONSIN  illinois   12.0
 
In [3]: a.getUELs()
Out[3]: ['WISCONSIN', 'iowa', 'illinois']

reorderUELs Examples

reorderUELs is a method of all GAMS symbol classes. This method allows the user to reorder UELs of a specific symbol dimension – reorderUELs will not all any new UELs to be create nor can they be removed. For example:

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
j = gt.Set(m, "j", i, records=["j1", "j2", "j3"])
a = gt.Parameter(m, "a", [i, j], records=[(f"i{i}", f"j{i}", i) for i in range(1,4)])

In [1]: i.getUELs()
Out[1]: ['i1', 'i2', 'i3']
 
In [2]: m.getUELs()
Out[2]: ['i1', 'i2', 'i3', 'j1', 'j2', 'j3']

But perhaps we want to reorder the UELs i1, i2, i3 to i3, i2, i1.

In [1]: i.reorderUELs(['i3', 'i2', 'i1'])
In [2]: i.getUELs()
Out[2]: ['i3', 'i2', 'i1']
 
In [3]: i.records
Out[3]:
  uni   element_text
0    i1
1    i2
2    i3

Note: This example does not change the indexing scheme of the Pandas DataFrame at all, it only changes the underlying integer numbering scheme for the categories. We can see this by looking at the Pandas codes:

In [1]: i.records["uni"].cat.codes
Out[1]:
0    2
1    1
2    0
dtype: int8

setUELs Examples

reorderUELs is a method of all GAMS symbol classes. This method allows the user to create new UELs, rename UELs, and reorder UELs all in one method. For example:

m = gt.Container()

i = gt.Set(m, "i", records=["i1", "i2", "i3"])

A user could accomplish a UEL reorder operation with setUELs:

In [1]: i.setUELs(["i3", "i2", "i1"])
 
In [2]: i.getUELs()
Out[2]: ['i3', 'i2', 'i1']
 
In [3]: i.records
Out[3]:
  uni   element_text
0    i1
1    i2
2    i3

A user could accomplish a UEL reorder + add UELs operation with setUELs:

In [1]: i.setUELs(["i3", "i2", "i1", "j1", "j2"])
 
In [2]: i.getUELs()
Out[2]: ['i3', 'i2', 'i1', 'j1', 'j2']
 
In [3]: i.records
Out[3]:
  uni   element_text
0    i1
1    i2
2    i3
 
In [4]: i.records["uni"].cat.codes
Out[4]:
0    2
1    1
2    0
dtype: int8

A user could accomplish a UEL reorder + add + rename with setUELs:

In [1]: i.setUELs(["j3", "j2", "j1", "ham", "cheese"], rename=True)
 
In [2]: i.getUELs()
Out[2]: ['j3', 'j2', 'j1', 'ham', 'cheese']
 
In [3]: i.records
Out[3]:
  uni   element_text
0    j3
1    j2
2    j1
 
In [4]: i.records["uni"].cat.codes
Out[4]:
0    0
1    1
2    2
dtype: int8

Note: This example does not change the indexing scheme of the Pandas DataFrame at all, but the rename=True flag means that the records will get updated just as if a renameUELs call had been made.

If a user wanted to set new UELs on top of this data, without renaming, they would need to be careful to include the current UELs in the UELs being set. It is possible to loose these labels if they are not included (which will prevent the data from being written to GDX/GMD).

m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2", "i3"])
i.setUELs(["j1", "i2", "j3", "ham", "cheese"])

In [1]: i.getUELs()
Out[1]: ['j1', 'i2', 'j3', 'ham', 'cheese']
 
In [2]: i.records
Out[2]:
  uni   element_text
0   NaN
1    i2
2   NaN

String Manipulation on UELs

It is easy to perform common string manipulations on UELs at the dimension, symbol and container levels with a series of convenience functions: lowerUELs, upperUELs, lstripUELs, rstripUELs, stripUELs, capitalizeUELs, casefoldUELs, titleUELs, ljustUELs, rjustUELs. These methods are wrappers around Python's built in string methods and are designed to efficiently perform bulk UEL transformations on your GAMS data.

The following example shows operations on the entire container:

m = gt.Container()
i = gt.Set(m, "i", ["*", "*"], records=[(f"i{i}", f"j{i}") for i in range(3)])
k = gt.Set(m, "k", records=[(f"aaa{i}") for i in range(3)])

In [1]: m.getUELs()
Out[1]: ['i0', 'i1', 'i2', 'j0', 'j1', 'j2', 'aaa0', 'aaa1', 'aaa2']
 
 
In [2]: m.upperUELs()
Out[2]: <GAMS Transfer Container (0x7f8110719e10)>
 
In [3]: m.getUELs()
Out[3]: ['I0', 'I1', 'I2', 'J0', 'J1', 'J2', 'AAA0', 'AAA1', 'AAA2']
 
In [4]: m.lowerUELs("i")
Out[4]: <GAMS Transfer Container (0x7f8110719e10)>
 
In [5]: m.getUELs()
Out[5]: ['i0', 'i1', 'i2', 'j0', 'j1', 'j2', 'AAA0', 'AAA1', 'AAA2']
 
In [6]: m.upperUELs().rjustUELs(4, "_")
Out[6]: <GAMS Transfer Container (0x7f8110719e10)>
 
In [7]: m.getUELs()
Out[7]: ['__I0', '__I1', '__I2', '__J0', '__J1', '__J2', 'AAA0', 'AAA1', 'AAA2']

Note: The ljustUELs and rjustUELs methods require the user to specify the final string length and the fill_character used to pad the string to achieve the final length.

Similar operations can be performed at the dimension and symbol levels as can be seen in the following examples:

In [1]: i.upperUELs(0)
Out[1]: <Set `i` (0x7f8121661930)>
 
In [2]: i.getUELs()
Out[2]: ['I0', 'I1', 'I2', 'j0', 'j1', 'j2']
 
In [3]: i.casefoldUELs()
Out[3]: <Set `i` (0x7f8121661930)>
 
In [4]: i.getUELs()
Out[4]: ['i0', 'i1', 'i2', 'j0', 'j1', 'j2']

Note: Symbol dimension is indexed from zero (per Python convention)

Reordering Symbols

The order of the Container file requires the symbols to be sorted such that, for example, a Set used as domain of another symbol appears before that symbol. The Container will try to establish a valid ordering when writing the data. This type of situation could be encountered if the user is adding and removing many symbols (and perhaps rewriting symbols with the same name) – users should attempt to only add symbols to a Container once, and care must be taken when creating symbol names. The method reorderSymbols attempts to fix symbol ordering problems. The following example shows how this can occur:

Example Symbol reordering

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["i" + str(i) for i in range(5)])
j = gt.Set(m, "j", i, records=["i" + str(i) for i in range(3)])

In [1]: m.data

Out[1]: {'i': <Set `i` (0x7f7e98907e50)>, 'j': <Set `j` (0x7f7e987fb580)>}

# now we remove the set i and recreate the data
m.removeSymbols("i")
i = gt.Set(m, "i", records=["i" + str(i) for i in range(5)])

The symbols are now out of order in .data and must be reordered:

In [1]: m.data

Out[1]: {'j': <Set `j` (0x7f7e987fb580)>, 'i': <Set `i` (0x7f7e9885a140)>}

# calling reorderSymbols() will order the dictionary properly, but the domain reference in j is now broken
m.reorderSymbols()
 
# fix the domain reference in the set j
j.domain = i

In [1]: m.isValid()

Out[1]: True

Rename Symbols

It is possible to rename a symbol even after it has been added to a Container. There are two methods that can be used to achieve the desired outcome:

using the container method renameSymbol
directly changing the name symbol property

We create a Container with two sets:

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["seattle", "san-diego"])
j = gt.Set(m, "j", records=["new-york", "chicago", "topeka"])

Example #1 - Change the name of a symbol with the container method

In [1]: m.renameSymbol("i","h")
 
In [2]: m.data
Out[2]: {'h': <Set `h` (0x7f7e988582e0)>, 'j': <Set `j` (0x7f7e801240d0)>}

Example #2 - Change the name of a symbol with the .name attribute

In [1]: i.name = "h"
 
In [2]: m.data
Out[2]: {'h': <Set `h` (0x7f7e98907520)>, 'j': <Set `j` (0x7f7ea84bb0d0)>}

Note: Note that the renamed symbols maintain the original symbol order, this will prevent unnecessary reordering operations later in the workflow.

Removing Symbols

Removing symbols from a container is easy when using the removeSymbols container method; this method accepts either a str or a list of str.

Attention: Once a symbol has been removed, it is possible to have hanging references as domain links in other symbols. The user will need to repair these other symbols with the proper domain links in order to avoid validity errors.

GAMS Special Values

The GAMS system contains five special values: UNDEF (undefined), NA (not available), EPS (epsilon), +INF (positive infinity), -INF (negative infinity). These special values must be mapped to their Python equivalents. transfer follows the following convention to generate the 1:1 mapping:

+INF is mapped to float("inf")
-INF is mapped to float("-inf")
EPS is mapped to -0.0 (mathematically identical to zero)
NA is mapped to a special NaN
UNDEF is mapped to float("nan")

transfer syntax is designed to quickly get data into a form that is usable in further analyses or visualization; this mapping also highlights the preference for data that is of type float, which offers performance benefits within Pandas/NumPy. The user does not need to remember these constants as they are provided within the class SpecialValues as SpecialValues.POSINF, SpecialValues.NEGINF, SpecialValues.EPS, SpecialValues.NA, and SpecialValues.UNDEF. The SpecialValues class also contains methods to test for these special values. Some examples are shown below; already, we, begin to introduce some of the transfer syntax.

Example (special values in a parameter)

import gams.transfer as gt
 
m = gt.Container()
x = gt.Parameter(
    m,
    "x",
    ["*"],
    records=[
        ("i1", 1),
        ("i2", gt.SpecialValues.POSINF),
        ("i3", gt.SpecialValues.NEGINF),
        ("i4", gt.SpecialValues.EPS),
        ("i5", gt.SpecialValues.NA),
        ("i6", gt.SpecialValues.UNDEF),
    ],
    description="special values",
)

The following DataFrame for x would look like:

In [1]: x.records
Out[1]:
  uni    value
0    i1    1.0
1    i2    inf
2    i3   -inf
3    i4   -0.0
4    i5    NaN
5    i6    NaN

The user can now easily test for specific special values in the value column of the DataFrame (returns a boolean array):

In [1]: gt.SpecialValues.isNA(x.records["value"])

Out[1]: array([False, False, False, False, True, False])

Other data structures can be passed into these methods as long as these structures can be converted into a numpy array with dtype=float. It follows that:

In [1]: gt.SpecialValues.isEps(gt.SpecialValues.EPS)
Out[1]: True
 
In [2]: gt.SpecialValues.isPosInf(gt.SpecialValues.POSINF)
Out[2]: True
 
In [3]: gt.SpecialValues.isNegInf(gt.SpecialValues.NEGINF)
Out[3]: True
 
In [4]: gt.SpecialValues.isNA(gt.SpecialValues.NA)
Out[4]: True
 
In [5]: gt.SpecialValues.isUndef(gt.SpecialValues.UNDEF)
Out[5]: True
 
In [6]: gt.SpecialValues.isUndef(gt.SpecialValues.NA)
Out[6]: False
 
In [6]: gt.SpecialValues.isNA(gt.SpecialValues.UNDEF)
Out[6]: False

Pandas DataFrames allow data columns to exist with mixed type (dtype=object) – transfer leverages this convenience feature to enable users to import string representations of EPS, NA, and UNDEF (or UNDF). transfer is tolerant of any mixed-case special value string representation. Python offers additional flexibility when representing negative/positive infinity. Any string x where float(x) == float("inf") evaluates to True can be used to represent positive infinity. Similarly, any string x where float(x) == float("-inf") evaluates to True can be used to represent negative infinity. Allowed values include inf, +inf, INFINITY, +INFINITY, -inf, -INFINITY and all mixed-case equivalents.

Example (special values defined by strings)

import gams.transfer as gt
 
m = gt.Container()
x = gt.Parameter(
    m,
    "x",
    ["*"],
    records=[
        ("i1", 1),
        ("i2", "+inf"),
        ("i3", "-infinity"),
        ("i4", "eps"),
        ("i5", "na"),
        ("i6", "undef"),
    ],
    description="special values",
)

These special strings will be immediately mapped to their float equivalents from the SpecialValues class in order to ensure that all data entries are float types.

Standard Data Formats

This section is meant to introduce the standard format that transfer expects for symbol records. It has already been mentioned that we store data as a Pandas DataFrame, but there is an assumed structure to the column headings and column types that will be important to understand. transfer includes convenience functions in order to ease the burden of converting data from a user-centric format to one that is understood by transfer. However, advanced users will want to convert their data first and add it directly to the Container to avoid making extra copies of (potentially large) data sets.

Set Records Standard Format

All set records (including singleton sets) are stored as a Pandas DataFrame with n number of columns, where n is the dimensionality of the symbol + 1. The first n-1 columns include the domain elements while the last column includes the set element explanatory text. Records are organized such that there is one record per row.

The names of the domain columns are flexible, but transfer requires unique column names. Users are encouraged to change the column headings of the underlying dataframe by using the domain_labels property. Using this property will ensure that unique column names are generated by adding a _<dimension> tag to the end of any user supplied column names. The explanatory text column is called element_text and must take the last position in the DataFrame.

All domain columns must be a categorical data type and the element_text column must be a object type. Pandas allows the categories (basically the unique elements of a column) to be various data types as well, however transfer requires that all these are type str. All rows in the element_text column must be type str.

Some examples:

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["seattle", "san-diego"])
j = gt.Set(m, "j", [i, "*"], records=[("seattle", "new-york"), ("san-diego", "st-louis")])
k = gt.Set(m, "k", [i], is_singleton=True, records=["seattle"])
 
In [1]: i.records
Out[1]:
         uni element_text
0    seattle
1  san-diego
 
In [2]: j.records
Out[2]:
           i       uni element_text
0    seattle  new-york
1  san-diego  st-louis
 
In [3]: k.records
Out[3]:
         i element_text
0  seattle

Parameter Records Standard Format

All parameter records (including scalars) are stored as a Pandas DataFrame with n number of columns, where n is the dimensionality of the symbol + 1. The first n-1 columns include the domain elements while the last column includes the numerical value of the records. Records are organized such that there is one record per row. Scalar parameters have zero dimension, therefore they only have one column and one row.

By default, the names of the domain columns follow a pattern of <set_name>; a symbol dimension that is referenced to the universe is labeled uni. The domain labels can be customized. Users are encouraged to change the column headings of the underlying dataframe by using the domain_labels property. Using this property will ensure that unique column names are generated (if not currently unique) by adding a _<dimension> tag to the end of any user supplied column names. The value column is called value and must take the last position in the DataFrame.

All domain columns must be a categorical data type and the value column must be a float type. Pandas allows the categories (basically the unique elements of a column) to be various data types as well, however transfer requires that all these are type str.

Some examples:

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["seattle", "san-diego"])
a = gt.Parameter(m, "a", ["*"], records=[("seattle", 50), ("san-diego", 100)])
b = gt.Parameter(
    m,
    "b",
    [i, "*"],
    records=[("seattle", "new-york", 32.2), ("san-diego", "st-louis", 123)],
)
c = gt.Parameter(m, "c", records=90)
 
In [1]: a.records
Out[1]:
       uni    value
0    seattle   50.0
1  san-diego  100.0
 
In [2]: b.records
Out[2]:
         i       uni    value
0    seattle  new-york   32.2
1  san-diego  st-louis  123.0
 
In [3]: c.records
Out[3]:
   value
0   90.0

Variable/Equation Records Standard Format

Variables and equations share the same standard data format. All records (including scalar variables/equations) are stored as a Pandas DataFrame with n number of columns, where n is the dimensionality of the symbol + 5. The first n-5 columns include the domain elements while the last five columns include the numerical values for different attributes of the records. Records are organized such that there is one record per row. Scalar variables/equations have zero dimension, therefore they have five columns and one row.

By default, the names of the domain columns follow a pattern of <set_name>; a symbol dimension that is referenced to the universe is labeled uni. The domain labels can be customized. Users are encouraged to change the column headings of the underlying dataframe by using the domain_labels property. Using this property will ensure that unique column names are generated (if not currently unique) by adding a _<dimension> tag to the end of any user supplied column names. The attribute columns are called level, marginal, lower, upper, and scale. These attribute columns must appear in this order. Attributes that are not supplied by the user will be assigned the default GAMS values for that variable/equation type; it is possible to not pass any attributes, transfer would then simply assign default values to all attributes.

All domain columns must be a categorical data type and all the attribute columns must be a float type. Pandas allows the categories (basically the unique elements of a column) to be various data types as well, however transfer requires that all these are type str.

Some examples:

import gams.transfer as gt
import pandas as pd
 
m = gt.Container()
i = gt.Set(m, "i", records=["seattle", "san-diego"])
a = gt.Variable(
    m,
    "a",
    "free",
    domain=[i],
    records=pd.DataFrame(
        [("seattle", 50), ("san-diego", 100)], columns=["city", "level"]
    ),
)
 
In [1]: a.records
Out[1]:
         i    level  marginal  lower  upper  scale
0    seattle   50.0       0.0   -inf    inf    1.0
1  san-diego  100.0       0.0   -inf    inf    1.0

GDX Read/Write

Up until now, we have been focused on using transfer to create symbols in an empty Container using the symbol constructors (or their corresponding container methods). These tools will enable users to ingest data from many different formats and add them to a Container – however, it is also possible to read in symbol data directly from GDX files using the read container method. In the following sections, we will discuss this method in detail as well as the write method, which allows users to write out to new GDX files.

Read GDX

There are two main ways to read in GDX based data.

Pass the file path directly to the Container constructor (will read all symbols and records)
Pass the file path directly to the read method (default read all symbols, but can read partial files)

The first option here is provided for convenience and will, internally, call the read method. This method will read in all symbols as well as their records. This is the easiest and fastest way to get data out of a GDX file and into your Python environment. For the following examples we leverage the GDX output generated from the `trnsport.gms` model file.

Example (reading full data w/ Container constructor)

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")

In [1]: m.data
Out[1]:
{'i': <Set `i` (0x7f95b8d63e80)>,
 'j': <Set `j` (0x7f95b8d63a60)>,
 'a': <Parameter `a` (0x7f95b8d63ee0)>,
 'b': <Parameter `b` (0x7f95b8d63d00)>,
 'd': <Parameter `d` (0x7f95b8da86a0)>,
 'f': <Parameter `f` (0x7f95b8da8670)>,
 'c': <Parameter `c` (0x7f95b8da83d0)>,
 'x': <Positive Variable `x` (0x7f95b8da83a0)>,
 'z': <Free Variable `z` (0x7f95b8da8400)>,
 'cost': <Eq Equation `cost` (0x7f95b8da82b0)>,
 'supply': <Leq Equation `supply` (0x7f95b8da8280)>,
 'demand': <Geq Equation `demand` (0x7f95b8da8580)>}

 In [1]: m.describeParameters()
 Out[1]:
  name  domain domain_type  dimension  number_records      min     mean      max            where_min            where_max sparsity
0    a     [i]     regular          1               2  350.000  475.000  600.000            [seattle]          [san-diego]      0.0
1    b     [j]     regular          1               3  275.000  300.000  325.000             [topeka]           [new-york]      0.0
2    c  [i, j]     regular          2               6    0.126    0.176    0.225  [san-diego, topeka]  [seattle, new-york]      0.0
3    d  [i, j]     regular          2               6    1.400    1.950    2.500  [san-diego, topeka]  [seattle, new-york]      0.0
4    f      []        none          0               1   90.000   90.000   90.000                 None                 None     None

A user could also read in data with the read method as shown in the following example.

Example (reading full data w/ read method)

import gams.transfer as gt
 
m = gt.Container()
m.read("trnsport.gdx")

In [1]: m.data
Out[1]:
{'i': <Set `i` (0x7f95b8d63e80)>,
 'j': <Set `j` (0x7f95b8d63a60)>,
 'a': <Parameter `a` (0x7f95b8d63ee0)>,
 'b': <Parameter `b` (0x7f95b8d63d00)>,
 'd': <Parameter `d` (0x7f95b8da86a0)>,
 'f': <Parameter `f` (0x7f95b8da8670)>,
 'c': <Parameter `c` (0x7f95b8da83d0)>,
 'x': <Positive Variable `x` (0x7f95b8da83a0)>,
 'z': <Free Variable `z` (0x7f95b8da8400)>,
 'cost': <Eq Equation `cost` (0x7f95b8da82b0)>,
 'supply': <Leq Equation `supply` (0x7f95b8da8280)>,
 'demand': <Geq Equation `demand` (0x7f95b8da8580)>}

It is also possible to read in a partial GDX file with the read method, as shown in the following example:

m = gt.Container()

m.read("trnsport.gdx", "x")

In [1]: m.data
Out[1]: {'x': <Positive Variable `x` (0x7f9598a38dc0)>}
 
In [2]: m.data["x"].records
Out[2]:
         i         j    level  marginal  lower  upper  scale
0    seattle  new-york   50.0     0.000    0.0    inf    1.0
1    seattle   chicago  300.0     0.000    0.0    inf    1.0
2    seattle    topeka    0.0     0.036    0.0    inf    1.0
3  san-diego  new-york  275.0     0.000    0.0    inf    1.0
4  san-diego   chicago    0.0     0.009    0.0    inf    1.0
5  san-diego    topeka  275.0     0.000    0.0    inf    1.0

This syntax assumes that the user will always want to read in both the metadata as well as the actual data records, but it is possible to skip the reading of the records by passing the argument records=False.

m = gt.Container()

m.read("trnsport.gdx", "x", records=False)

In [1]: m.data
Out[1]: {'x': <Positive Variable `x` (0x7f9598a3a200)>}
 
In [2]: m["x"].summary
Out[2]:
{'name': 'x',
 'description': 'shipment quantities in cases',
 'type': 'positive',
 'domain': ['i', 'j'],
 'domain_type': 'regular',
 'dimension': 2,
 'number_records': 6}
 
In [3]: type(m["x"].records)
Out[3]: <class 'NoneType'>

Attention: The read method attempts to link the domain objects together (in order to have a "regular" domain_type) but if domain sets are not part of the read operation there is no choice but to default to a "relaxed" domain_type. This can be seen in the last example where we only read in the variable x and not the domain sets (i and j) that the variable is defined over. All the data will be available to the user, but domain checking is no longer possible. The symbol x will remain with "relaxed" domain type even if the user were to read in sets i and j in a second read call.

Write GDX

A user can write data to a GDX file by simply passing a file path (as a string). The write method will then create the GDX and write all data in the Container.

Example

m.write("path/to/file.gdx")

Example (write a compressed GDX file)

m.write("path/to/file.gdx", compress=True)

By default, all symbols in the Container will be written, however it is possible to write a subset of the symbols to a GDX file with the symbols argument. If a domain set is not included in the symbols list then the symbol will automatically be relaxed (but will retain the domain set's name as a string label – it does not get relaxed to *). This behavior can be seen in the following example.

import gams.transfer as gt
 
m = gt.Container()
i = gt.Set(m, "i", records=["i1", "i2"])
a = gt.Parameter(
    m,
    "a",
    [i, i],
    records=[("i1", "i1", 10), ("i2", "i2", 12)],
)
 
m.write("out.gdx", "a")
 
# create a new container and read in the GDX
m2 = gt.Container("out.gdx")
 
# look at all the data
In [1]: m2.data
Out[1]: {'a': <Parameter `a` (0x7f9598a61510)>}
 
# notice that `a` has a relaxed domain type now
In [2]: m2["a"].domain_type
Out[2]: 'relaxed'
 
# `a` retains the labels from the original domain sets
In [3]: m2["a"].domain
Out[3]: ['i', 'i']
 
# The original container `m` retains its original state before writing
In [4]: m["a"].domain
Out[4]: [<Set `i` (0x7f9598a39a80)>, <Set `i` (0x7f9598a39a80)>]

In line 4 we can see that the auto-relaxation of the domain for a is only temporary for writing (in this case, from Container object m) and will be restored so as not to disturb the Container state.

Advanced users might want to specify an order to their UEL list (i.e., the universe set); recall that the UEL ordering follows that dictated by the data. As a convenience, it is possible to prepend the UEL list with a user specified order using the uel_priority argument.

Example (change the order of the UEL)

m = gt.Container()
i = gt.Set(m, "i", records=["a", "b", "c"])
m.write("foo.gdx", uel_priority=["a", "c"])

The original UEL order for this GDX file would have been ["a", "b", "c"], but this example reorders the UEL with uel_priority – the positions of b and c have been swapped. This can be verified with the gdxdump utility (using the uelTable argument):

gdxdump foo.gdx ueltable=foo

Set foo /
  'a' ,
  'c' ,
  'b' /;
$onEmpty

Set i(*) /
'a',
'c',
'b' /;

$offEmpty

GamsDatabase Read/Write

We have discussed how to create symbols in an empty Container and we have discussed how to exchange data with GDX files, however it is also possible to read and write data directly in memory by interacting with a GamsDatabase/GMD object – this allows transfer to be used to read/write data within an Embedded Python Code environment or in combination with the Python OO API. There are some important differences when compared to data exchange with GDX since we are working with data representations in memory.

Read GamsDatabases

Just as with a GDX, there are two main ways to read in data that is in a GamsDatabase/GMD object.

Pass the GamsDatabase/GMD object directly to the Container constructor (will read all symbols and records)
Pass the GamsDatabase/GMD object directly to the read method (default read all symbols, but can read partial files)

The first option here is provided for convenience and will, internally, call the read method. This method will read in all symbols as well as their records. This is the easiest and fastest way to get data out of a GamsDatabase/GMD object and into your Python environment. While it is possible to generate a custom GamsDatabase/GMD object from scratch (using the gmdcc API), most users will be interacting with a GamsDatabase/GMD object that has already been instantiated internally when he/she is using Embedded Python Code or the GamsDatabase class in the Python OO API. Our examples will show how to access the GamsDatabase/GMD object – we leverage the some of the data from the `trnsport.gms` model file.

Example (reading full data w/ Container constructor)

m = gt.Container(gams.db)

Note: Embedded Python Code users will want pass the GamsDatabase object that is part of the GAMS Database object – this will always be referenced as gams.db regardless of the model file.

The following example uses embedded Python code to create a new Container, read in all symbols, and display some summary statistics as part of the gams log output.

Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle    350
          san-diego  600 /

   b(j) 'demand at market j in cases'
        / new-york   325
          chicago    300
          topeka     275 /;

Table d(i,j) 'distance in thousands of miles'
              new-york  chicago  topeka
   seattle         2.5      1.7     1.8
   san-diego       2.5      1.8     1.4;

$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container(gams.db)
print(m.describeSets())

print(m.describeParameters())

$offEmbeddedCode

The gams log output will then look as such (the extra print calls are just providing nice spacing for this example):

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- matrix.gms(29) 3 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
  name  is_singleton domain domain_type  dimension  number_records sparsity
0    i         False    [*]        none          1               2     None
1    j         False    [*]        none          1               3     None
  name  domain domain_type  dimension  number_records      min     mean      max            where_min            where_max sparsity
0    a     [i]     regular          1               2  350.000  475.000  600.000            [seattle]          [san-diego]      0.0
1    b     [j]     regular          1               3  275.000  300.000  325.000             [topeka]           [new-york]      0.0
2    d  [i, j]     regular          2               6    1.400    1.950    2.500  [san-diego, topeka]  [seattle, new-york]      0.0

--- Starting execution - empty program
*** Status: Normal completion

[3 rows x 16 columns]

--- Starting execution - empty program
*** Status: Normal completion

A user could also read in a subset of the data located in the GamsDatabase object with the read method as shown in the following example. Here we only read in the sets i and j, as a result the .describeParameters() method will return None.

Example (reading subset of full data w/ read method)

Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle    350
          san-diego  600 /

   b(j) 'demand at market j in cases'
        / new-york   325
          chicago    300
          topeka     275 /;

Table d(i,j) 'distance in thousands of miles'
              new-york  chicago  topeka
   seattle         2.5      1.7     1.8
   san-diego       2.5      1.8     1.4;

$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container()
m.read(gams.db, symbols=["i","j"])
gams.printLog("")
print(m.describeSets())
print(m.describeParameters())

$offEmbeddedCode

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- matrix.gms(29) 3 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
--- name  is_singleton domain domain_type  dimension  number_records sparsity
0    i         False    [*]        none          1               2     None
1    j         False    [*]        none          1               3     None
None

--- Starting execution - empty program
*** Status: Normal completion

All the typical functionality of the Container exists when working with GamsDatabase/GMD objects. This means that domain linking, matrix conversion, and other more advanced options are available to the user at either compilation time or execution time (depending on the Embedded Code syntax being used, see: Syntax). The next example generates a 1000x1000 matrix and then takes its inverse using the Numpy linalg package.

Example (Matrix Generation and Inversion)

set i / i1*i1000 /;
alias(i,j);

parameter a(i,j);
a(i,j) = 1 / (ord(i)+ord(j) - 1);
a(i,i) = 1;


embeddedCode Python:
import gams.transfer as gt
import numpy as np
import time

gams.printLog("")
s = time.time()
m = gt.Container(gams.db)
gams.printLog(f"read data: {round(time.time() - s, 3)} sec")

s = time.time()
A = m["a"].toDense()
gams.printLog(f"create matrix A: {round(time.time() - s, 3)} sec")

s = time.time()
invA = np.linalg.inv(A)
gams.printLog(f"generate inv(A): {round(time.time() - s, 3)} sec")

endEmbeddedCode

Note: In this example, the assignment of the a parameter is done during execution time so we must use the execution time syntax for embedded code in order to get the numerical records properly.

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- test.gms(27) 3 Mb
--- Starting execution: elapsed 0:00:00.003
--- test.gms(9) 36 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---
--- read data: 1.1 sec
--- create matrix A: 0.02 sec
--- generate inv(A): 0.031 sec
*** Status: Normal completion

We will extend this example in the next section to write the inverse matrix A back into a GAMS parameter.

Write to GamsDatabases

A user can write to a GamsDatabase/GMD object with the .write() method just as he/she would write a GDX file – however there are some important differences. When a user writes a GDX file the entire GDX file represents a complete data environment (all domains have been resolved, etc.) thus, transfer does not need to worry about merge/replace operations. It is possible to merge/replace symbol records when a user is writing data to in-memory data representations with GamsDatabase/GMD. We show a few examples to illustrate this behavior.

Example (Populating a set in GAMS)

* note that we need to declare the set i over "*" in order to provide hints about the symbol dimensionality
set i(*);

$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container()
i = gt.Set(m, "i", records=["i"+str(i) for i in range(10)])
m.write(gams.db)

$offEmbeddedCode i


embeddedCode Python:
import gams.transfer as gt

m = gt.Container(gams.db)
gams.printLog("")
print(m["i"].records)

endEmbeddedCode

Note: In general, it is possible to use transfer to create new symbols in a GamsDatabase and GMD object (and not necessarily merge symbols) but embedded code best practices necessitate the declaration of any GAMS symbols on the GAMS side first, then the records can be filled with transfer.

If we break down this example we can see that the set i is declared within GAMS (with no records) and then the records for i are set by writing a Container to the gams.db GamsDatabase object (we do this at compile time). The second embedded Python code block runs at execution time and is simply there to read all the records on the set i – printing the sets this way adds the output to the .log file (we could also use the more common display i; operation in GAMS to display the set elements in the LST file).

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- test.gms(10) 2 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
--- test.gms(20) 3 Mb
--- Starting execution: elapsed 0:00:01.464
--- test.gms(13) 4 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---   uni   element_text
0    i0
1    i1
2    i2
3    i3
4    i4
5    i5
6    i6
7    i7
8    i8
9    i9

*** Status: Normal completion

Example (Merging set records)

set i / i1, i2 /;

$onmulti
$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container()
i = gt.Set(m, "i", records=["i"+str(i) for i in range(10)])
m.write(gams.db, merge_symbols="i")

$offEmbeddedCode i
$offmulti

embeddedCode Python:
import gams.transfer as gt

m = gt.Container(gams.db)
gams.printLog("")
print(m["i"].records)

endEmbeddedCode

In this example we need to make use of $onMulti/$offMulti in order to merge new set elements into the the set i (the same would be true if we were merging other symbol types) – any symbol that already has records defined (in GAMS) and is being added to with Python (and transfer) must be wrapped with $onMulti/$offMulti. As with the previous example, the second embedded Python code block runs at execution time and is simply there to read all the records on the set i. Note that the UEL order will be different in this case (i1 and i2 come before i0).

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- test.gms(11) 3 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
--- test.gms(21) 3 Mb
--- Starting execution: elapsed 0:00:01.535
--- test.gms(14) 4 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---   uni   element_text
0    i1
1    i2
2    i0
3    i3
4    i4
5    i5
6    i6
7    i7
8    i8
9    i9

*** Status: Normal completion

Example (Replacing set records)

set i / x1, x2 /;

$onmultiR
$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container()
i = gt.Set(m, "i", records=["i"+str(i) for i in range(10)])
m.write(gams.db)

$offEmbeddedCode i
$offmulti

embeddedCode Python:
import gams.transfer as gt

m = gt.Container(gams.db)
gams.printLog("")
print(m["i"].records)

endEmbeddedCode

In this example we want to replace the x1 and x2 set elements and built up a totally new element list with set elements from the Container. Instead of $onMulti/$offMulti we must use $onMultiR/$offMulti to ensure that the replacement happens in GAMS; we also need to remove the set i from the merge_symbols argument.

Attention: If the user seeks to replace all records in a symbol they must use the $onMultiR syntax. It is not sufficient to simply remove them from the merge_symbols argument in transfer. If the user mistakenly uses $onMulti the symbols will end up merging without total replacement.

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- test.gms(11) 3 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
--- test.gms(21) 3 Mb
--- Starting execution: elapsed 0:00:01.482
--- test.gms(14) 4 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---   uni   element_text
0    i0
1    i1
2    i2
3    i3
4    i4
5    i5
6    i6
7    i7
8    i8
9    i9

*** Status: Normal completion

Example (Merging parameter records)

set i;
parameter a(i<) /
i1 1.23
i2 5
/;

$onmulti
$onembeddedCode Python:
import gams.transfer as gt

m = gt.Container()
i = gt.Set(m, "i", records=["i"+str(i) for i in range(10)])
a = gt.Parameter(m, "a", domain=i, records=[("i"+str(i),i) for i in range(10)])
m.write(gams.db, merge_symbols="a")

$offEmbeddedCode i, a
$offmulti

embeddedCode Python:
import gams.transfer as gt

m = gt.Container(gams.db)
gams.printLog("")
print(m["a"].records)
endEmbeddedCode

In this example we also need to make use of $onMulti/$offMulti in order to merge new set elements into the the set i, however the set i also needs to contain the elements that are defined in the parameter – here we make use of the < operator that will add the set elements from a(i) into the set i

Note: It would also be possible to run this example by explicitly defining the set i /i1, i2/; before the parameter declaration.

Attention: transfer will overwrite all duplicate records when merging. The original values of a("i1") and a("i2") have been replaced with their new values when writing the Container in this example (see output below).

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- test.gms(16) 3 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
--- test.gms(25) 3 Mb
--- Starting execution: elapsed 0:00:01.467
--- test.gms(19) 4 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---   i    value
0  i1    1.0
1  i2    2.0
2  i3    3.0
3  i4    4.0
4  i5    5.0
5  i6    6.0
6  i7    7.0
7  i8    8.0
8  i9    9.0

*** Status: Normal completion

Example (Advanced Matrix Generation and Inversion w/ Write Operation)

set i / i1*i1000 /;
alias(i,j);

parameter a(i,j);
a(i,j) = 1 / (ord(i)+ord(j) - 1);
a(i,i) = 1;

parameter inv_a(i,j);
parameter ident(i,j);

embeddedCode Python:
import gams.transfer as gt
import numpy as np
import time

gams.printLog("")
gams.printLog("")

s = time.time()
m = gt.Container(gams.db)
gams.printLog(f"read data: {round(time.time() - s, 3)} sec")

s = time.time()
A = m["a"].toDense()
gams.printLog(f"create matrix A: {round(time.time() - s, 3)} sec")

s = time.time()
invA = np.linalg.inv(A)
gams.printLog(f"calculate inv(A): {round(time.time() - s, 3)} sec")

s = time.time()
m["inv_a"].setRecords(invA)
gams.printLog(f"convert matrix to records for inv(A): {round(time.time() - s, 3)} sec")

s = time.time()
I = np.dot(A,invA)
tol = 1e-9
I[np.where((I<tol) & (I>-tol))] = 0
gams.printLog(f"calculate A*invA + small number cleanup: {round(time.time() - s, 3)} sec")

s = time.time()
m["ident"].setRecords(I)
gams.printLog(f"convert matrix to records for I: {round(time.time() - s, 3)} sec")

s = time.time()
m.write(gams.db, ["inv_a","ident"])
gams.printLog(f"write to GamsDatabase: {round(time.time() - s, 3)} sec")

gams.printLog("")
endEmbeddedCode inv_a, ident

display ident;

In this example we extend the example shown in Read GamsDatabases to read data from GAMS, calculate a matrix inversion, do the matrix multiplication, and then write both the A^-1 and A*A^-1 (i.e., the identity matrix) back to GAMS for display in the LST file. This data round trip highlights the benefits of using a transfer Container (and the linked symbol structure) as the mechanism to move data – converting back and forth from a records format to a matrix format can be cumbersome, but here, transfer takes care of all the indexing for the user.

The first few lines of GAMS code generates a 1000x1000 A matrix as a parameter (at execution time), we then define two more parameters that we will fill with results of the embedded Python code – specifically we want to fill a parameter with the matrix A^-1 and we want to verify that another parameter (ident) contains the identity matrix (i.e., I). Stepping through the code:

We start the embedded Python code section (execution time) by importing both transfer and Numpy and by reading all the symbols that currently exist in the GamsDatabase. We must read in all this information in order to get the domain set information – transfer needs these domain sets in order to generate matricies with the proper size.
Generate the matrix A by calling .toDense() on the symbol object in the Container.
Take the inverse of A with np.linalg.inv().
The Parameter symbol for inv_a already exists in the Container, but it does not have any records (i.e., m["inv_a"].records is None will evaluate to True). We use .setRecords() to convert the invA back into a records format.
We continue the computations by performing the matrix multiplication using np.dot() – we must clean up a lot of small numbers in I.
The Parameter symbol for ident already exists in the Container, but it does not have any records. We use .setRecords() to convert I back into a records format.
Since we are calculating these parameter values at execution time, it is not possible to modify the domain set information (or even merge/replace it). Therefore we only want to write the parameter values to GAMS. We achieve this by writing a subset of the Container symbols out with the m.write(gams.db, ["inv_a","ident"]) call. This partial write preserves symbol validity in the Container and it does not violate other GAMS requirements.
Finally, we can verify that the (albeit large) identity matrix exists in the LST file (or in another GDX file).

Note: It was not possible to just use np.round because small negative numbers that round to -0.0 will be interpreted by transfer as the GAMS EPS special value.

The output for this example is shown below:

GAMS 43.1.0   Copyright (C) 1987-2023 GAMS Development. All rights reserved
--- Starting compilation
--- matrix.gms(52) 3 Mb
--- Starting execution: elapsed 0:00:00.004
--- matrix.gms(11) 36 Mb
--- Initialize embedded library libembpycclib64.dylib
--- Execute embedded library libembpycclib64.dylib
---
---
--- read data: 1.083 sec
--- create matrix A: 0.016 sec
--- calculate inv(A): 0.032 sec
--- convert matrix to records for inv(A): 0.176 sec
--- calculate A*invA + small number cleanup: 0.027 sec
--- convert matrix to records for I: 0.17 sec
--- write to GamsDatabase: 1.937 sec
---
--- matrix.gms(52) 68 Mb
*** Status: Normal completion

Container Read

Containers can read from other Container instances. The syntax and behavior is much the same as reading from GDX and GMD sources. It is important to note that a deepcopy of all data is made when reading from these sources. The container object can be passed into the constructor (to be consistent with the shorthand notation) or the object can be passed as a argument to the .read() method.

import gams.transfer as gt
 
m = gt.Container("trnsport.gdx")
 
In [1]: m.data
Out[1]:
{'i': <Set `i` (0x7fc1d86d8e80)>,
 'j': <Set `j` (0x7fc1d86d8e50)>,
 'a': <Parameter `a` (0x7fc1d86d8df0)>,
 'b': <Parameter `b` (0x7fc1d86d8fa0)>,
 'd': <Parameter `d` (0x7fc1d86d84c0)>,
 'f': <Parameter `f` (0x7fc1d86d9000)>,
 'c': <Parameter `c` (0x7fc1d86d9120)>,
 'x': <Positive Variable `x` (0x7fc1d86d90f0)>,
 'z': <Free Variable `z` (0x7fc1d86d8fd0)>,
 'cost': <Eq Equation `cost` (0x7fc1d86d8cd0)>,
 'supply': <Leq Equation `supply` (0x7fc1d86d8c40)>,
 'demand': <Geq Equation `demand` (0x7fc1d86d8c10)>}

m2 = gt.Container()
m2.read(m)
 
# equivalent to m2 = gt.Container(m)
 
In [7]: m2.data
Out[7]:
{'i': <Set `i` (0x7fc1c8153fa0)>,
 'j': <Set `j` (0x7fc1c8153730)>,
 'a': <Parameter `a` (0x7fc1c8153cd0)>,
 'b': <Parameter `b` (0x7fc1d86fc790)>,
 'd': <Parameter `d` (0x7fc1f87dd240)>,
 'f': <Parameter `f` (0x7fc1c8153eb0)>,
 'c': <Parameter `c` (0x7fc1f87ddb40)>,
 'x': <Positive Variable `x` (0x7fc1c81536a0)>,
 'z': <Free Variable `z` (0x7fc1f87ddc60)>,
 'cost': <Eq Equation `cost` (0x7fc1c81539a0)>,
 'supply': <Leq Equation `supply` (0x7fc1f87dcd00)>,
 'demand': <Geq Equation `demand` (0x7fc1d86ff700)>}

Combining two containers

In this example we create two containers (which could have been populated from GDX files) and add in all symbol that do not currently exist in the first Container

import gams.transfer as gt
 
m1 = gt.Container()
i = gt.Set(m1, "i", records=[f"i{i}" for i in range(10)])
j = gt.Set(m1, "j", records=[f"j{i}" for i in range(10)])
k = gt.Set(m1, "k", records=[f"k{i}" for i in range(10)])
 
m2 = gt.Container()
a = gt.Set(m2, "a", records=[f"a{i}" for i in range(10)])
b = gt.Set(m2, "b", records=[f"b{i}" for i in range(10)])
k = gt.Set(m2, "k", records=[f"k{i}" for i in range(10)])
 
 
# now read in everything from m2 that does not exist in m1 (will read `a` and `b`)
m1.read(m2, [symname for symname, obj in m2 if symname not in m1])
 
 
In [1]: m1.data
Out[1]:
{'i': <Set `i` (0x7f9e504231c0)>,
 'j': <Set `j` (0x7f9e3043cc10)>,
 'k': <Set `k` (0x7f9e509bcd00)>,
 'a': <Set `a` (0x7f9e50423760)>,
 'b': <Set `b` (0x7f9e508ed0c0)>}
 
In [2]: m1.isValid()
Out[2]: True

Table of Contents

Validating Data

Custom Column Headings

Converting Records

Comparing Symbols

Domain Forwarding

Domain Violations

Duplicate Records

Pivoting Data

Describing Data

describeSets

describeParameters

describeVariables

describeEquations

describeAliases

Matrix Generation

The Universe Set

Customize the Universe Set

getUELs Examples

addUELs Examples

removeUELs Examples

renameUELs Examples

reorderUELs Examples

setUELs Examples

String Manipulation on UELs

Reordering Symbols

Rename Symbols

Removing Symbols

GAMS Special Values

Standard Data Formats

GDX Read/Write

Read GDX

Write GDX

GamsDatabase Read/Write

Read GamsDatabases

Write to GamsDatabases

Container Read