On August 7th in 2000, the first version of GDX was distributed as part of GAMS release 19.4. Two decades later, we are happy to finally publish the source code of the expert-level API for the GAMS Data eXchange (GDX) to GitHub on November 14th in 2023 . As this code publication is accompanied with a MIT-like license , this effectively makes the GDX API open-source software and implicitly documents the internal layout of the GDX file format. To celebrate setting an important core piece of GAMS technology free, this blog post is shedding some more light on what GDX actually is (both the file format and the API), why it was created, how it is used in the GAMS ecosystem today, and the effort that went into getting it ready for the open-source release. If you just want to take a look at the GitHub repository, you should follow this link .
The GDX file format and API
The GAMS modeling language is very well suited for formulating mathematical optimization models in a way that is very close to the algebraic mathematical notation used by modelers. For small model instances, the language provides suitable data definition directives, like the table definition . In practice, instances often consist of large amounts of data. To load data easily and efficiently into a GAMS model, the GDX format and API was developed with key contributions coming from Paul van der Eijk. There are multiple benefits to storing model data in a GDX file instead of using textual representations like CSV or data definition syntax in GAMS language:
- Space efficiency: GDX stores symbol records with the smallest datatypes possible and allows optional compression. Erwin Kalvelagen documented the significant size advantage of uncompressed and compressed GDX files in comparison to CSV or SQLite in multiple posts on his blog “Yet Another Math Programming Consultant ” (see 1 , 2 , 3 ).
- Increased performance: Parsing textual data encodings is slower than reading a dense binary format with well-defined structure.
- Persistency: GDX is a persistent staging database and represents a frozen snapshot of the model data. Other data sources like (relational) database systems can change frequently. Therefore GDX can be very helpful for reproducing a particular state during debugging.
- Platform independence: Files are portable and can be passed between Windows, Linux, and macOS machines with arbitrary endianness.
- Ease of loading und saving: The sets and parameters of a model can be populated from the data inside a GDX with often just 1-2 lines of code necessary.
- Unix philosophy : Instead of GAMS reading and writing directly various formats, GDX as staging database allows multiple specialized and highly parameterized tools to deal with the diverse zoo of formats.
- Versioning: Each GDX file stores version information, which allows future GAMS versions to still read/understand GDX files written years ago.
GDX as central component of the GAMS ecosystem
Due to the previously listed advantages, GAMS makes intensive use of GDX and provides useful tooling for dealing with GDX files. Reading from and writing GDX files from inside a GAMS model is very easy, as the GAMS language offers multiple commands for these tasks. Hence GDX is well suited to store data for a model instance, or the results of an optimization run. GAMS also supports writing just a subset of specific symbols to a GDX file or only selectively reading from it.
It is easy to inspect a GDX file
from the command line with gdxdump
(see here
) and with a graphical user interface called “GDX Viewer” inside of GAMS Studio
. Furthermore, the GAMS distribution comes with multiple tools
to convert data from various formats into GDX and vice versa. GAMS Connect
offers a very generic way of building data processing pipelines that can convert to and from GDX inside of it. Many workflows in the GAMS ecosystem employ GDX files at one point or another. For example, GDX files are a good way to submit large chunks of data for GAMS Engine
jobs.
Making GDX source code available for everyone
GAMS has traditionally built many software components in Pascal (and its object-oriented extension Delphi) instead of the now more prevalent C (and C++). While Pascal and its derivates arguably offer better readability due to a less terse syntax and increased safety from strong typing, it is undeniable the C language and its offsprings are the dominant languages for programming performance-critical applications. Hence, the first step towards open sourcing GDX involved translating the Delphi source code into C++17.
To make sure the resulting ported library is writing and reading GDX files in the correct way without increased runtime and memory consumption, we ran validation tests and benchmarks for a large library of heterogeneous GDX files. Additionally, we wrote a new suite of unit tests for the GDX library.
The open-source publication of GDX also includes a continuous integration pipeline for GitLab which builds the GDX library on all supported platforms (Windows, macOS, Linux), runs the unit test suite with memory leak checking, does a performance comparison with the Delphi GDX library, and generates the HTML documentation and serves it on GitHub pages .
Making the source code of GDX openly available together with the required steps for building the dynamic library used by many tools inside and outside the GAMS distribution, everyone can now build their own modified GDX library and maintain it independently of GAMS. Besides implicitly documenting the file layout, the open-source publication significantly reduces the risk of not being able to work with GDX files on future platforms or systems.