GAMS/GAMSPy and Databricks walkthrough

Who this is for

If you are a GAMS user who has migrated to Databricks, or if your organization relies on the Databricks Lakehouse and you need a high-performance tool to embed optimization directly into your data pipelines, this post is for you. Moving data between environments is a relic of the past; with GAMS and GAMSPy, you can formulate and solve large-scale optimization models without ever duplicating data or leaving your secure environment. Databricks has become the central hub for enterprise data-managed in Delta tables, governed by Unity Catalog, and queryable via SQL. While the integration is straightforward, navigating the initial setup can be a hurdle. This walkthrough is designed to clear that path, providing a definitive guide for basic GAMS and GAMSPy use cases in the modern data stack.

Through this step by step tutorial, you will learn to:

Install GAMS/GAMSPy on your compute
Streamline data reading from Databricks directly into your models
Solve a model with GAMS/GAMSPy (Locally or in a Databricks notebook)
Write the solution back to Delta tables

This tutorial does not aim to cover endless possibilities of using your data with Databricks. It instead focuses on the optimization part of your decision pipeline.

Step 0: Prerequisites

It is assumed that the user is familiar with the basic GAMS/GAMSPy syntax, and has a Databricks account. Depending on where you are trying to read the data into, the connection to Databricks varies slightly. If you are in the Databricks environment such as a notebook, you have spark available readily and you can use it to read data tables. However, if you want to to develop and run models locally and simply use Databricks as the data layer, this needs to be done with Databricks Connect which lets a local Python script use spark as if it were running on the cluster, without any changes to the model code. databricks-connect is pip-installable and your system can be configured with Databricks credentials using databricks configure --token from your terminal. This prompts you for your workspace URL and a personal access token, then writes them to ~/.databrickscfg. Every subsequent script picks this up automatically.

C:\Users\username>databricks configure --token
Databricks workspace host (https://...): <your_host>.databricks.com
Personal access token: ************************************

Once set up, you can create spark explicitly with DatabricksSession. The code-snippets in this blog-post assume that you have appropriately initialized spark depending on your requirement.

# Notebook: spark is provided automatically, no import needed
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()

Note that databricks-connect requires an active all-purpose compute resource. To verify the connection is working:

databricks clusters list

Setup used

Compute: Databricks workspace with a dedicated personal compute cluster (AWS in this case)
SQL: Databricks Serverless SQL Warehouse, used to create and query Delta tables via the SQL Editor

Demo example

This tutorial uses the classic transport problem. This is a simple LP but it is sufficient to demonstrate the flow of data in a complex decision pipeline consisting of Databricks data and GAMS/GAMSPy model.

Two plants can ship goods to three markets and given all the required data about the demands, supply capacity, and transportation costs, we want to minimize the total transportation cost.

	New York	Chicago	Topeka
Seattle (cap: 350)	$0.225	$0.153	$0.162
San Diego (cap: 600)	$0.225	$0.162	$0.126

Market demands: New York 325, Chicago 300, Topeka 275.

The formulation

Let $ x_{ij} $ be the number of cases shipped from plant $ i $ to market $ j $.

minimize total shipping cost:

$ \min \sum_{i,j} c_{ij} \cdot x_{ij} $

subject to:

supply constraints: do not exceed plant capacity:

$ \sum_j x_{ij} \leq a_i \quad \forall i $

demand constraints: satisfy each market:

$ \sum_i x_{ij} \geq b_j \quad \forall j $

$ x_{ij} \geq 0 \quad \forall i, j $

It is assumed that all model inputs are Delta tables in Databricks. This reflects how real production models work: your supply, demand, and cost data are already in your Lakehouse, maintained by other pipelines, and your optimization model is a consumer of that data.

The three input tables for the transport problem are:

demo.default.transport_supply - plant names and available capacity
demo.default.transport_demand - market names and required demand
demo.default.transport_cost - unit shipping cost for each plant-market pair

For this post, we created these using the Databricks SQL editor connected to a Serverless SQL Warehouse. Once run, all four tables (including the empty transport_shipments output table) are visible in the Catalog Explorer.

GAMSPy

Step 1: GAMSPy installation and licensing

GAMSPy needs to be installed on your dedicated compute cluster. Open your Databricks notebook and run:

%pip install gamspy

GAMSPy requires a valid GAMS license to solve models beyond the community edition limits. To add your license on the cluster:

!gamspy install license <access code or path_to_ascii_file>

Step 2: Read data into Python


CATALOG = "demo"
SCHEMA = "default"

def full(table: str) -> str:
    return f"{CATALOG}.{SCHEMA}.{table}"

# When running locally
#from databricks.connect import DatabricksSession
#spark = DatabricksSession.builder.getOrCreate()

supply_df = spark.table(full("transport_supply")).toPandas()
demand_df = spark.table(full("transport_demand")).toPandas()
cost_df = spark.table(full("transport_cost")).toPandas()

print("Supply nodes:")
display(supply_df)
print("Demand nodes:")
display(demand_df)
print("Costs:")
display(cost_df)

Step 3: Build and solve the model

The model can also be found in the GAMSPy examples GitHub repository.

from gamspy import Container, Set, Parameter, Variable, Equation, Model, Sum, Sense

m = Container()

i = Set(m, name="i", description="supply nodes / plants")
j = Set(m, name="j", description="demand nodes / markets")

a = Parameter(
    m, name="a", domain=[i],
    domain_forwarding=True,
    records=supply_df.rename(columns={"plant": "i", "capacity": "value"}),
    description="plant capacity (cases)"
)

b = Parameter(
    m, name="b", domain=[j],
    domain_forwarding=True,
    records=demand_df.rename(columns={"market": "j", "demand": "value"}),
    description="market demand (cases)"
)

c = Parameter(
    m, name="c", domain=[i, j],
    records=cost_df.rename(columns={"plant": "i", "market": "j", "cost": "value"}),
    description="transport cost ($ per case per 1000 miles)"
)

x = Variable(
    m, name="x", domain=[i, j], type="positive",
    description="shipment quantity (cases)"
)

supply = Equation(m, name="supply", domain=[i], description="do not ship more than available capacity")
supply[i] = Sum(j, x[i, j]) <= a[i]

demand = Equation(m, name="demand", domain=[j], description="satisfy each market's demand")
demand[j] = Sum(i, x[i, j]) >= b[j]

cost_obj = Sum((i, j), c[i, j] * x[i, j])

transport = Model(
    m,
    name="transport",
    equations=m.getEquations(),
    problem="LP",
    sense=Sense.MIN,
    objective=cost_obj,
)

transport.solve()

print(f"\nSolve status : {transport.solve_status}")
print(f"Model status : {transport.status}")
print(f"Objective ($) : {transport.objective_value:.4f}")

Step 4: Write results back

Finally, you can write the shipment quantities back to Databricks as shown below.

from datetime import datetime, timezone

results = x.records.copy()
results.columns = ["plant", "market", "quantity", "marginal", "lower", "upper", "scale"]
results = results[["plant", "market", "quantity"]].query("quantity > 0").reset_index(drop=True)

print("\nOptimal shipment plan:")
print(results.to_string(index=False))

run_ts = datetime.now(timezone.utc)
results["run_ts"] = run_ts


# Note: When running locally, spark is initialized using databricks-connect
result_sdf = spark.createDataFrame(results)

# Overwrite with this run's results (append if you want history)
(result_sdf
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(full("transport_shipments")))

print(f"\n Shipments written to {full('transport_shipments')} at {run_ts}")
display(spark.table(full("transport_shipments")))

The terminal output is shown below

results
           i         j  level  marginal  lower  upper  scale
0    Seattle  New-York   50.0     0.000    0.0    inf    1.0
1    Seattle   Chicago  300.0     0.000    0.0    inf    1.0
2    Seattle    Topeka    0.0     0.036    0.0    inf    1.0
3  San-Diego  New-York  275.0     0.000    0.0    inf    1.0
4  San-Diego   Chicago    0.0     0.009    0.0    inf    1.0
5  San-Diego    Topeka  275.0     0.000    0.0    inf    1.0

Optimal shipment plan:
    plant   market  quantity
  Seattle New-York      50.0
  Seattle  Chicago     300.0
San-Diego New-York     275.0
San-Diego   Topeka     275.0

Shipments written to demo.default.transport_shipments

Verification - rows in transport_shipments:
    plant   market  quantity                     run_ts
  Seattle New-York      50.0 2026-04-24 20:01:19.700020
  Seattle  Chicago     300.0 2026-04-24 20:01:19.700020
San-Diego New-York     275.0 2026-04-24 20:01:19.700020
San-Diego   Topeka     275.0 2026-04-24 20:01:19.700020

GAMSPy: the verdict

By leveraging GAMSPy within Databricks, you achieve a clean, Pythonic workflow that feels like a native extension of your Spark environment. The transition from spark.table to a GAMSPy symbol is seamless, allowing you to move from raw data to an optimal shipment plan with remarkably concise and readable code.

GAMS

Your GAMS models may depend on data located remotely on Databricks or you might want to run your GAMS model individually or as part of a pipeline on Databricks’ infrastructure. This section covers both of these use cases.

Step 1: GAMS installation and licensing

Unlike GAMSPy, GAMS is not a Python package - it is a compiled binary that needs to be available on the cluster’s filesystem. Databricks clusters are ephemeral (they boot from a clean image on every start), but Unity Catalog Volumes are persistent. The approach here takes advantage of that: extract GAMS into the Volume once, and use a minimal init script only to add it to PATH on each cluster startup. Store your .sh init script in a Unity Catalog Volume for better security and portability.

The setup has three steps:

Upload the GAMS Linux installer and your license file to a Unity Catalog Volume:
Extract (or execute) GAMS into the Volume

unzip /Volumes/workspace/default/gams_install/linux_x64_64_sfx.exe \
      -d /Volumes/workspace/default/gams_install/

This extracts GAMS into a subdirectory named after the version, e.g. gams53.4_linux_x64_64_sfx.

Attach an init script that adds the GAMS directory to PATH on every cluster start

Step 2: Read data with GAMS

This section shows how to read demand data directly from a Databricks SQL Warehouse inside a classic .gms file. There are two ways of achieving this, both using the GAMS embedded code facility .

GAMS Connect

GAMS Connect is a data exchange framework. It consists of a range of agents for integrating data from various types of files into the connect database, transform the data, and write data from connect database to various destinations. The agents we need here are SQLReader and SQLWriter, which allows reading from a specific DBMS into the connect database. SQLReader can connect to Databricks via SQLAlchemy connection type since the databricks-sqlalchemy package provides a SQLAlchemy dialect for Databricks SQL Warehouses. There are several ways to use connect (command line, Python, GAMS Embedded code). In this section, we will use it via GAMS embedded code facility - however, the underlying syntax is YAML and can be easily used in a way that is convenient to the user. The SQL Warehouse handles the query; GAMS handles the solve.

GAMS Connect’s SQLReader uses the Python installation that GAMS is configured to call which by default uses GMSPYTHON. However, since we need databricks-sqlalchemy it is recommended that we connect our own custom Python Installation as shown in the embedded code python documentation .

pip install databricks-sqlalchemy

In the script below, we read the b(j) demand parameter from Databricks table workspace.default.demand. Everything else - sets, parameters, variables, equations, solve statement - is unchanged from the classic formulation.

The <pat>, <http_path>, and <host_id> in the connection block below are just placeholders. You should replace it with your true values:

  
    
Set
   i 'canning plants' / seattle, san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle    350
          san-diego  600 /

   b(j) 'demand at market j in cases'
;

$onEmbeddedCode connect:
- SQLReader:
    connectionType: 'sqlalchemy'
    connection:
      drivername: 'databricks'
      host: ''
      username: 'token'
      password: ''
      query:
        http_path: '/sql/1.0/warehouses/********'
        schema: 'default'
        catalog: 'demo'
    symbols:
      - name: demand
        query: "SELECT market, demand FROM workspace.default.demand"
- GAMSWriter:
    symbols:
        - name: demand
          newName: b
$offEmbeddedCode

* more data definition...

Databricks-connect from embeddedcode Python facility

If your system is already set up for accessing Databricks data from Python, you can use an approach similar to the one we saw for the local setup of GAMSPy, but from embedded code facility.

  
    
$onEmbeddedCode Python:
from databricks.connect import DatabricksSession
import gams.transfer as gt

# Auth from ~/.databrickscfg
spark = DatabricksSession.builder.getOrCreate()

supply_df = spark.table("demo.default.transport_supply").toPandas()
demand_df = spark.table("demo.default.transport_demand").toPandas()
cost_df = spark.table("demo.default.transport_cost").toPandas()

m = gt.Container(gams.db)
m['a'].domain_forwarding= True
m['a'].setRecords(supply_df)

m['b'].domain_forwarding= True
m['b'].setRecords(demand_df)

m.write(gams.db)
$offEmbeddedCode

Here, you need your external Python connected to GMSPython or have GMSPython extended to include databricks-connect package. Moreover, you need your ./databrickscfg file to include relevant details about your host and access tokens. You can then use transfer Python to read/write data from the GAMS database. We are also reading the symbols i, j, and a(i) in addition to b(j) as shown before. By using the domain_forwarding property for a and b, the records for the domain sets i and j are defined implicitly.

Step 3: Build and solve the model

Define relevant variables, constraints, model, solvers (if applicable) and solve the model.

  
    
Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..      z =e= sum((i,j), c(i,j)*x(i,j));

supply(i).. sum(j, x(i,j)) =l= a(i);

demand(j).. sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

Step 4: Write results back

Similar to Step 2, there are two ways of writing the results back to Delta tables i.e., using GAMS Connect and using embeddedCode Python.

GAMS Connect

We want to write the results x, however, to write it using GAMS Connect, we have to extract the values into a parameter. In this case, we define a parameter shipments(i, j) and assign the shipment quantities to it.

  
    
parameter shipments(i, j);
shipments(i, j) = x.l(i, j);

embeddedCode connect:
- GAMSReader:
    symbols:
        - name: shipments
- SQLWriter:
    ifExists: replace
    connectionType: 'sqlalchemy'
    schemaName: default
    connection:
      drivername: 'databricks'
      host: '
      username: 'token'
      password: ''
      database: 'demo'
      query:
        http_path: ''
        schema: 'default'
        catalog: 'demo'
    symbols:
      - name: shipments
        tableName: transport_shipments

endEmbeddedCode
display x.l, x.m;

GAMS Embedded Code

Similar to reading, if your system is set up to connect to Databricks from your local Python scripts, you may prefer using embeddedcode Python. For production, consider using Delta Merges in the write statement.

  
    
embeddedCode Python:
from databricks.connect import DatabricksSession
import gams.transfer as gt

# Auth from ~/.databrickscfg - no secrets here
spark = DatabricksSession.builder.getOrCreate()

m = gt.Container(gams.db)
shipments = m['x'].records.loc[:, :'level']
shipments.columns = ["plant", "market", "quantity"]
result_sdf = spark.createDataFrame(shipments)

# Overwrite with this run's results (append if you want history)
(result_sdf
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable("demo.default.transport_shipments_gamspython"))

endEmbeddedCode

Note that there are two embedded code blocks in the GAMS code. The first block reads data using SQLReader and writes to GAMS database using GAMSWriter. The second block, reads the shipments parameter from the GAMS database using GAMSReader and writes it to Databricks using SQLWriter. More details on the SQLReader and SQLWriter agents can be found in the GAMS documentation . Another observation to make is that the \$onEmbeddedCode-\$offEmbeddedCode block is run at compile-time when the data is read. However, since the shipment values are only available at execution-time (after solve), we are using embeddedCode-endEmbeddedCode.

GAMS: the verdict

Whether you use GAMS Connect or embedded code, the integration remains robust and transparent. The ability to pull data from a SQL Warehouse directly into a .gms file ensures that your GAMS models and new high-performance formulations remain connected to your organization’s single source of truth, all from the comfort of GAMS.

Key takeaway

The real power of this integration lies in eliminating data silos. By bringing GAMS and GAMSPy directly to Databricks, you transform your optimization model from a standalone script into a fully integrated component of your enterprise decision pipeline. You no longer have to manage the “optimization outside the pipeline” headache; instead, you get a secure, scalable, and governed workflow that speaks the language of your data.

Whether you are a GAMS expert embracing the Lakehouse or a Databricks power user looking for the gold standard in optimization, we hope this walkthrough helps you accelerate your journey. We are always looking to improve our tools for the community - reach out to us at support@gams.com to share your experience.