GAMS Engine is a highly distributed system to solve your GAMS models. The heart of this system is a REST API where users can submit GAMS jobs. The REST API is the only interface to the outside world and therefore all communication runs through it.
The system will queue submitted jobs and assign them to an available worker - a GAMS process that solves the model. With GAMS Engine One, workers are defined and initialized during installation. This means that there is a fixed number of workers that share the available resources (e.g. RAM and CPU). In case all workers are busy solving other jobs, the job is pending in a queue until a free worker is available. With GAMS Engine SaaS, workers are created on demand at runtime, and Kubernetes takes care of scheduling as resources become available.
GAMS Engine is a containerized application that largely follows the microservice architecture. The two figures above show all the containers that make up Engine One and Engine Saas, respectively, and how they communicate. The figures are lacking four migration containers which are only used during software updates. These containers are explained in the section Pre-Migrate, MongoDB Migrate, PostgreSQL Migrate and RabbitMQ Migrate.
The Rest API is the heart of the system. It is a Python container that hosts the application logic for all user accessible tasks. Users communicate with this container (through the Nginx container) to accomplish any task. You can find an extensive list of operations with their expected inputs and possible outputs here.
Nginx functions as a reverse-proxy. It is the only component that is accessible from an external network. It can use TLS to ensure the security of the connection. It is discouraged but possible to not use TLS. We encourage every internet-facing application to use TLS. In addition to reverse proxying requests to the REST API container (which talks uwsgi), this container also hosts the Engine UI. The UI is located at the root url (/), if not changed during installation, and the REST API is located at the /api url. Engine UI is open-source and its source code and the Nginx config files can be found at https://github.com/GAMS-dev/engine-ui.
We chose RabbitMQ for our message queueing solution. It communicates with nearly all other containers. The reason for that is in GAMS Engine, we follow an event-based approach as much as possible. For example, when a job finishes, a message is put into the finish queue. The Cleaner listens to the finish queue and deletes intermediate files that need to be deleted.
We use PostgreSQL to store our structured data such as submission information, user tables, invitations and so on.
Before understanding the Hypercube Unpacker, it is important to understand the Hypercube jobs concept. Quoting our documentation about Hypercube jobs:
Imagine a GAMS model that runs through a number of different scenarios. The model is always initialized by the same large GDX file, but behaves differently depending on double dash parameter --scenId. In this situation you want to avoid uploading the same large GDX file for each scenario. Instead, you want to upload it once and create a number of jobs with different command line arguments: a Hypercube job.
The Hypercube Unpacker is responsible for creating many jobs out of a Hypercube job. When the REST API puts a Hypercube Job to the queue, the Hypercube Unpacker gets it and puts as many jobs as required to the solve queue.
Hypercube jobs can contain hundreds of jobs. After the workers finish all the jobs belonging to a Hypercube Job, it would be tedious to download them separately. The Hypercube Appender appends all the result zips into a single zip file. It also marks the end time of a Hypercube job.
The Worker container is responsible for solving GAMS jobs. It takes the next job from the solve queue, grabs the binary data from mongoDB, and structured data from PostgreSQL. It starts solving the GAMS job by starting a GAMS process. The Worker is the only place where users can run arbitrary code via their models, therefore we limit workers by default to not have any internet access. This can be changed during installation. The Worker container comes with few pre-installed programs to minimize security risks. After a successful run, the worker uploads the result into mongoDB for further use and inspection.
Users can send temporary models to GAMS Engine where the model files are not needed anymore when the job is finished. Also users can provide additional data to a model which is not needed afterwards. The Cleaner listens to job complete messages and checks if the submission had temporary files and deletes them. It is also responsible for deleting queues that are not necessary anymore. It is important to note that the results must be deleted manually, as the cleaner cannot guess whether they are needed or not.
In mathematical modeling, it is common that one model depends on the result of another model. In those cases, Engine uses the dependency system to reduce network traffic. Without the dependency checker, the user sends job 'A', waits for it to finish, downloads the results, puts results into job 'B' and sends job 'B'. With it, users can send job 'A', and send job 'B' immediately afterwards stating that it is dependent on job 'A'. Therefore, job 'B' will only start when job 'A' is finished. And the results of the job 'A' will be mounted into the directory of the job 'B'. The Dependency checker is responsible for putting job 'B' into the solve queue when all of its dependencies are finished.
Forwards requests from Broker (REST API) to their intended destinations and ensures integrity is maintained. This container is a part of GAMS Engine and will not function without the other required components.
When there is a new version of GAMS Engine, we might change the docker-compose file, SQL tables, add a new MongoDB index, or create a new exchange in RabbitMQ, etc. These changes are made through the respective migrate containers. Migrate containers connect to their respective containers and update data definitions. These four containers are not depicted in the figure above because they are executed only once after an update of GAMS Engine.
In Engine SaaS, workers are spawned on demand during runtime. The component responsible for spawning new workers is called job spawner. It communicates with the Kubernetes API to create new Kubernetes jobs.
The job watcher monitors the active workers for events such as "out-of-memory" or "out-of-disk" and marks the jobs accordingly.
When a job is canceled, the job canceler removes worker pods that have been created, but could not be spawned yet (e.g. due to lack of available resources).