Description
This program takes raw data from an Excel sheet, transforms the data and finds the optimal values for alpha and beta following a procedure outlined by Stephen Wright. An integral is computed numerically using a trapezoidal approximations. The optimization results are then written back into the Excel file. This procedure was originally coded by David O'Brochta in Mathematica and recast in a GAMS optimization framework by Wilhelmine Meeraus. The Excel file contains the data from the field and labwork described in W Meeraus, 2004. References Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau, Population Dynamics of an Ac-like Transposable Element in Self- and Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001). David O'Brochta, Private Communications, University of Maryland Biotechnology Institute, 2004. Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into the Behaviour of the Herves Transposable Element, London School of Hygiene & Tropical Medicine, Theses, 2004. Keywords: nonlinear programming, discontinuous derivatives, genetics, transposable element activity, trapezoidal approximation
Small Model of Type : DNLP
Category : GAMS Model library
Main file : herves.gms includes : hervesio.xlsx
$title Herves (Transposable Element) Activity Calculations (HERVES,SEQ=305)
$onText
This program takes raw data from an Excel sheet, transforms the data
and finds the optimal values for alpha and beta following a procedure
outlined by Stephen Wright. An integral is computed numerically
using a trapezoidal approximations. The optimization results are then
written back into the Excel file.
This procedure was originally coded by David O'Brochta in Mathematica
and recast in a GAMS optimization framework by Wilhelmine Meeraus.
The Excel file contains the data from the field and labwork described
in W Meeraus, 2004.
References
Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau,
Population Dynamics of an Ac-like Transposable Element in Self- and
Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001).
David O'Brochta, Private Communications, University of Maryland Biotechnology
Institute, 2004.
Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into
the Behaviour of the Herves Transposable Element, London School of Hygiene
& Tropical Medicine, Theses, 2004.
Keywords: nonlinear programming, discontinuous derivatives, genetics,
transposable element activity, trapezoidal approximation
$offText
$eolCom //
* you can use command line --xlsxFile=xxx and --sheet=xxx which
* will allow further automation
* the sheet names are Aa,Ag,Am,ZU,ZAg
$if not set xlsxFile $set xlsxFile 'hervesio.xlsx' // work book name
$if not set sheet $set sheet 'Ag' // sheet name
$set rng '%sheet%!A4' // starting range to read data
$set rngout '%sheet%_Opt!A1' // starting range to store results
* 1. Extract raw data from an Excel sheet into a GDX file
* and load into GAMS. The result of this step are
* the set of samples i and the counts cnt. Empty or all
* zero rows and columns will be dropped automatically from
* the Excel data.
Set
imax 'max sample size' / 1*50 /
i(imax) 'samples';
Parameter cnt(imax) 'counts';
Alias (*,code,LabId,Cut);
Parameter raw(code,LabId,Cut) 'raw observations';
Set rr(code,LabID) 'row labels including empty rows';
* empty rows will be reported
$onEmbeddedCode Connect:
- ExcelReader:
file: %xlsxFile%
symbols:
- name: raw
rowDimension: 2
columnDimension: 1
range: %rng%
- Projection:
name: raw(code,LabId,Cut)
newName: rr(code,LabId)
asSet: True
- GAMSWriter:
symbols: all
duplicateRecords: first
$offEmbeddedCode
Set
rid(code,labID) 'row identifier'
cid(Cut) 'column identifier';
Parameter ct 'column totals';
option rid < raw, cid < raw, rid:0:0:1; // map domains and set print option
abort$(card(rid) > card(imax)) 'size of imax is to small', imax, rid;
ct(cid) = sum(rid, raw(rid,cid));
display raw, ct;
Set rrdiff 'all zero rows';
rrdiff(rr) = yes - rid(rr);
display rrdiff;
i(imax) = ord(imax) <= card(rid);
cnt(imax) = sum(cid, ct(cid) = ord(imax));
display i, cnt;
* 2. Prepare model constants and set up numerical integration
* data sets.
Parameter
diploid 'sample size'
nhat
bin(imax) 'binomial for diploid';
diploid = card(i);
bin(i(imax)) = fact(card(i))/fact(card(i) - ord(imax))/fact(ord(imax));
* numerical integration data
Set
s 'integral discritization steps' / s0*s1000 /
ss(s) 'excluding first element';
Parameter
xx(s) 'values for steps s (0-1)'
ws(s) 'weight for trapezoid approximation'
deltaxx;
ss(s) = ord(s) > 1; // drop values for x = 0
deltaxx = 1/(card(s) - 1);
xx(s) = (ord(s) - 1)*deltaxx;
ws(s) = 1;
ws(s)$(ord(s) = 1) = 0.5;
ws(s)$(ord(s) = card(s)) = 0.5;
Parameter
t0(s) 'intermediate value one'
t1(imax,s) 'intermediate value two';
t0(s) = sqr(xx(s))+ 2*xx(s)*(1 - xx(s));
t1(i(imax),s) = ws(s)*power(t0(s),ord(imax))*power(1 - t0(s),card(i) - ord(imax));
* 3. Define optimization model and set relevant bounds.
* Note that not all parameters are known at this point.
Variable
chi 'sum of CHI squares'
a 'alpha values'
b 'beta values'
est(imax) 'estimates';
Equation
defchi 'definition of CHI'
defest(imax) 'definition of estimates';
defchi.. chi =e= sum(i, sqr(cnt(i) - est(i))/est(i));
defest(i).. est(i) =e= nhat*(a + b)/a/beta(a,b)*bin(i)*deltaxx
* sum(s, t1(i,s)*xx(s)**(a - 1)*(1 - xx(s))**(b - 1));
Model m / all /;
* set bounds and initial values
a.lo = 0.01; a.up = 1.0;
b.lo = 1; b.up = 10;
a.l = .1; b.l = 5;
est.l(i) = 1;
option domLim = 1000, limCol = 0, limRow = 0, solPrint = on;
Parameter rep(*,*) 'summary report';
* 4. First Optimization
nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);
solve m minimizing chi using dnlp;
display diploid, nhat, a.l, b.l, chi.l;
rep(i ,'w-obs') = cnt(i);
rep(i ,'w-est') = est.l(i);
rep(i ,'w-chi') = sqr(cnt(i) - est.l(i))/est.l(i);
rep('total','w-chi') = sum(i, rep(i,'w-chi'));
rep('total','w-obs') = sum(i, cnt(i));
rep('alpha','w-est') = a.l;
rep('beta' ,'w-est') = b.l;
rep('nhat' ,'w-est') = nhat;
rep('SStat' ,'w-est') = m.solveStat;
rep('MStat' ,'w-est') = m.modelStat;
* 5. Second solve with modified counts
// use a cutoff of 70% and solve again
cnt(i(imax))$(ord(imax) > 0.7*card(i)) = 0;
nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);
display nhat;
solve m min chi using dnlp;
display diploid, nhat, a.l, b.l, chi.l;
rep(i ,'wo-obs') = cnt(i);
rep(i ,'wo-est') = est.l(i);
rep(i ,'wo-chi') = sqr(cnt(i) - est.l(i))/est.l(i);
rep('total','wo-obs') = sum(i, cnt(i));
rep('total','wo-chi') = sum(i, rep(i,'wo-chi'));
rep('alpha','wo-est') = a.l;
rep('beta' ,'wo-est') = b.l;
rep('nhat' ,'wo-est') = nhat;
rep('SStat' ,'wo-est') = m.solveStat;
rep('MStat' ,'wo-est') = m.modelStat;
display rep;
* 6. Update Excel workbook with results in a new sheet
embeddedCode Connect:
- GAMSReader:
symbols:
- name: rep
- ExcelWriter:
file: %xlsxFile%
symbols:
- name: rep
range: %rngout%
endEmbeddedCode