LM Mech.mws
Fitting a Strength Function by the Levenberg-Marquardt Method
By J.M. R
edwood
Synopsis
Procedure
mnlfit
, based on the Levenberg-Marquardt method, is used to fit a function relating adhesive bond strength to three variables. The maximum strength and values of the variables to produce it are estimated.
Copyright
The procedure used in this worksheet is an adaptation of Dr. D.E. Holmgren's procedure
mnlfit
2000. His permission to reproduce and adapt the procedure is gratefully acknowledged.
Procedure
mnlfit
Original Procedure
This worksheet implements the Levenberg-Marquardt method of fitting a non-linear function to a set of data. The general form of the fitting function is:
y = f(a[0] ...a[n] ;x[0] ...x[m] )
i.e., it is possible to have more than one independent variable. The user must supply a Maple routine to compute the function
f
(see examples below). The procedure
mnlfit
will then compute the required derivatives. The user must also supply the data in the form of vectors of
x
and
y
values (or a matrix of
x
values in the case of more than one independent variable), as well as weights for the data points. The weights are usually the reciprocals of the standard deviations for each of the observed data points, but if not known, they may be set to unity. The routine below will iterate the solution either until the tolerance
tol
is reached or until the maximum number of iterations (50) is reached.
If the procedure requires a full 50 iterations, the data and formulation of the problem should be re-examined. In any case, the best way of learning how to use
mnlfit
is to run through the example below.
If there is sufficient interest and/or demand, a future version of this code will include automatic differentiation for more complicated model functions.
All of the usual qualifications concerning free software apply here (i.e., no claims of suitability for any particular application). Responsibility lies with the user to ensure that correct results are obtained!
Any questions about
mnlfit
should be addressed to D.E. Holmgren, together with any suggestions for modifications.
This procedure was developed jointly by:
D.E. Holmgren (holmgren@brandonu.ca) Brandon University, Brandon, MB, Canada
J.F. Ogilvie (ogilvie@munkebo.chem.ou.dk)
M. Monagan (monagan@cecm.sfu.ca) Center for Experimental and Constructive Mathematics, Simon Fraser University, Vancouver BC, Canada.
Adaptations of Original Procedure
Dr. D.E. Holmgren's original worksheet was downloaded from www.maplesoft.com/apps/categories/data_analysis_stats/data/html/genfit_6.html, and was adapted with Maple V Release 5.1. It can be used with both Maple V R5.1 and Maple 6.
The adaptations were:
(a) Removal of the redundant procedure
fitstats
.
(b) Removal of statements that attempted to calculate the covariance ratio,
F
(referred to as
fstat
in the original).
(c) Removal of all commands preceded by the # sign.
(d) Removal of the square root from weighting operations.
(e) Provision of means to control the output.
(f) Provision of an analysis of covariance and tests of the regression.
(g) Improved presentation of output.
Tests of the regression include G.W. Snedecor's test of the covariance ratio,
F
. This test is only valid for data drawn from a (multi-variable) Normal distribution. Users must satisfy themselves that this is the case, before placing any credence in this test. Naturally, the same applies to any other tests based, like Snedecor's, on Sir Ronald Fisher's
z
distribution and his transforms of
z
.
Definitions
mnlfit := proc(f, X, x::array, y::vector, wt::vector, P::{name,list(name)},p::vector,tol,Prin,digi::integer)
f
is the name of the function to be fitted to the data,
X
is the name of the independent variable appearing in this function,
x
is the array of observed values of
X
,
y
is the array of observed values of the dependent variable or function,
wt
is an array of weights (usually the inverses of the standard deviation of each value of dependent variable, but can be set to unity if weights are not known),
P
is the name of the parameters of
X
as a vectorial quantity,
p
is a vector of initial estimates of values of the parameters of
X
(rough guesses are usually good enough - the Levenberg-Marquardt method is not very sensitive to the initial estimates),
tol
is a tolerance that serves as a criterion of convergence of the solution,
Prin
controls the output of supplementary information - insert:
S
in place of
Prin
if the fit statistics are required,
R
if a table of residuals is required,
B
if both statistics and residuals are required,
It
if the residuals after each iteration are required,
All
causes all the above items to be printed,
Inserting any other name, or leaving it as
Prin
, prevents their output,
digi
is an integer that controls the number of digits printed by
mnlfit
.
mnlfit
prints the number of iterations, the
Std Deviation of Residuals
(Goodness of Fit) statistic, the supplementary information required, and returns
f
.
Procedure
mnlfit
>
restart: Digits := 40:
>
mnlfit := proc(f, X, x::array, y::vector, wt::vector, P::{name,list(name)},p::vector,
tol,Prin,digi::integer)
local gradf,nobs,npar,k,yck,xk,r,a,drv,niter,j,n,corr,corrprint,alpha,beta,cvm,b,p1,errors,dp,
res,nu,res1,sigma1,reso,nxvar,yc,finals,hdg,ssr,ssyc,ssy,cf,msr,msyc,msy,F,pF,Rsq,Tests, anova,resi,rout;
# Initialize.
if type(P,name) then
RETURN(mnlfit(f,X,x,y,wt,[seq(P[k],k=1..linalg[vectdim](p))],p,tol,Prin,digi));
fi;
gradf := linalg[grad](f,P);
reso := 1.0*10^9;
nobs := linalg[vectdim](y);
npar := linalg[vectdim](p);
nxvar := linalg[coldim](x);
xk := vector(nxvar);
r := vector(nobs);
a := matrix(nobs,npar,0.0);
corr := matrix(npar,npar,0.0);
drv := vector(npar);
p1 := vector(npar);
errors := vector(npar);
niter := 50;
nu := 1000;
# Begin iterations.
for n from 1 to niter do
# Compute initial fit and residuals ( = observed - calculated).
for k from 1 to nobs do
for j from 1 to nxvar do xk[j] := x[k,j]
od; yck,drv := gry(f,gradf,X,P,npar,xk,p);
r[k] := evalf(wt[k]*(y[k] - yck));
for j from 1 to npar do
a[k,j] := wt[k]*drv[j];
od:
od:
# Sum of squares of residuals
res1 := evalf(linalg[dotprod](r,r));
if Prin = It or Prin = All then print(` Iteration No `, n ,` initial SSR = `, evalf(res1,digi));
fi; # Compute matrix alpha of normal equations.
alpha := evalf(evalm(linalg[transpose](a) &* a));
# Compute vector beta for right sides of normal equations.
beta := evalf(evalm(linalg[transpose](a) &* r));
# Note use of factor nu in this next loop; this increases as one approaches solution.
for j from 1 to npar do
alpha[j,j] := evalf(alpha[j,j]*(1. + 1./nu));
od:
# Solve a system of normal equations to correct parameters.
dp := evalf(linalg[linsolve](alpha, beta));
# Compute sum of squares of residuals at new point p1 = p + dp.
p1 := evalf(evalm(p + dp));
for k from 1 to nobs do
for j from 1 to nxvar do xk[j] := x[k,j] od;
yck,drv := gry(f,gradf,X,P,npar,xk,p1);
r[k] := evalf(wt[k]*(y[k]-yck));
# Store yc[k] ie calculated y.
yc[k] := yck;
od:
res := evalf(linalg[dotprod](r,r));
if Prin = It or Prin = All then print(` new SSR = `, evalf(res,digi));
fi; # Modify nu according to whether the sum of squares increases or decreases after this correction. # For the wrong direction, elements are not corrected.
if res1 <= res then
nu := nu/10.;
else
# For the right direction, correct the elements.
nu := nu*10.;
p := evalf(evalm(p + dp));
fi:
# Test for convergence.
if abs(res - reso) <= tol then
break
fi;
reso := res;
od:
# Covariance matrix
cvm := evalf(linalg[inverse](alpha));
# Standard deviation of residuals
sigma1 := sqrt(res/(nobs - npar));
# Errors of parameters
for j from 1 to npar do
errors[j] := evalf(sigma1*sqrt(cvm[j,j]));
od:
for j from 1 to npar do
for k from 1 to npar do
corr[j,k] := cvm[j,k]/sqrt(cvm[j,j]*cvm[k,k])
od:
od: # Print results.
print (array([[`No of Iterations`,`Std Deviation of Residuals`],[n,evalf(sigma1, digi)]]));
print();
if Prin = S or Prin = B or Prin = All then
print();
# Calculate Sums of Squares, Covariances & Tests.
cf := (add(wt[j]*y[j],j=1..nobs))^2/nobs;
ssyc := add((wt[j]*yc[j])^2,j=1..nobs) - cf;
ssr := add(wt[j]^2*(yc[j] - y[j])^2,j=1..nobs);
ssy := add((wt[j]*y[j])^2,j=1..nobs) - cf;
msyc := ssyc/(npar-1);
msy := ssy/(nobs - 1);
msr := ssr/(nobs - npar);
Rsq := ssyc/ssy;
pF := 1 - stats[statevalf,cdf,fratio[(npar - 1),(nobs - npar)]](msyc/msr);
anova := array([[`Source`,Sigma(`Squares`),`DF`,`Mean Square`], [`Regression`,trunc(ssyc),(npar-1),evalf(msyc,digi)], [`Residuals`,trunc(ssr),(nobs-npar),evalf(msr,digi)],[`Total`,trunc(ssy),(nobs - 1),` `]]);
Tests := array([[R^2,`F`,`Prob of F by chance`],[` `,` `,`for Normal data`],
[evalf(Rsq,3),evalf(msyc/msr,3),evalf(pF,3)]]);
# Print Anova and Tests
print(`Analysis of Covariance`);
print();
print(anova);
print();
print(`Tests of Covariance`);
print();
print(Tests);
print();
print (`Final values of parameters`); hdg := array([`Parameter`,`Value`,`Standard Error`]); finals := linalg[augment]([seq(j,j=1..npar)], convert(p,list), [seq(errors[j],j=1..npar)]); finals := evalf(evalm(finals),digi):
finals := linalg[stackmatrix](hdg,finals);
print(finals);
print();
corr := evalf(evalm(corr),digi):
cvm := evalf(evalm(cvm),digi):
print(`Matrix of Covariances`);
print();
print(cvm);
print();
print(` Matrix of Correlation Coefficients `);
print();
print(corr);
print();
fi;
if Prin =R or Prin = B or Prin = All then print(` Table of residuals `);
print();
resi := convert([seq([(seq(evalf(x[j,k],digi),k=1..nxvar),evalf(y[j],digi),evalf(yc[j],digi),
evalf(y[j]-yc[j],digi), evalf(wt[j]*(y[j]-yc[j])/sigma1, digi))],j=1..nobs)],matrix);
hdg := array([(seq(`x obs`[k],k=1..nxvar),` y obs `,` y calc `,` difference `,`wt*difference / sd of residuals`)]);
resi := linalg[stackmatrix](hdg,resi);
print(resi);
print();
fi;
RETURN(subs(seq(P[j]=evalf(p[j],digi),j=1..npar),f));
end:
gry := proc(f,G,x,a,n,xval,p::vector)
# Procedure to evaluate derivatives and formula
local k,y,i,drv,point;
point := {x=xval, seq(a[i]=p[i], i=1..n)};
drv := vector(n);
for k from 1 to n do
drv[k] := evalf(eval(G[k],point))
od:
y := evalf(eval(f,point));
RETURN(y,drv);
end:
Problem
A manufacturer wanted to join two parts with an adhesive instead of welding, but needed to know the maximum strength that could be obtained. The strength of a bond depends upon the cure time (mins), curing temperature (deg C) and quantity of accelerator added to the adhesive (% by weight). Although the relationship between strength and the variables was believed to follow that of a second order polynomial, the coefficients were unknown. Identical joints were therefore broken in a test machine and the force (kN) needed was measured. Fifteen combinations of cure time, temperature and accelerator were tested in a random order in accordance with a planned design. Each combination was tested 3 times in a random order.
Given the test results, the problem reduces to:
a. Fitting the functional relationship to the test data,
b. Estimating those values that produced the maximum strength,
c. Estimating the maximum strength.
Solution
General
In this example, the coefficients of the polynomial relating strength to the variables are assumed to be linear, and so a procedure for fitting a linear curve could be used. However, procedure
mnlfit
will be used to illustrate its versatility, although it is intended for fitting non-linear equations to data.
Data
First, the test data are imported,
>
test_data := [[1, 45, 125, 3, 19.01], [2, 45, 125, 3, 19.97], [3, 45, 125, 3, 19.49], [4, 45, 125, 7, 23.95], [5, 45, 125, 7, 23.03], [6, 45, 125, 7, 23.49], [7, 45, 175, 3, 24.59], [8, 45, 175, 3, 23.59], [9, 45, 175, 3, 24.09], [10, 45, 175, 7, 49.63], [11, 45, 175, 7, 50.55], [12, 45, 175, 7, 50.09], [13, 75, 125, 3, 27.69], [14, 75, 125, 3, 28.09], [15, 75, 125, 3, 28.49], [16, 75, 125, 7, 32.45], [17, 75, 125, 7, 31.73], [18, 75, 125, 7, 32.09], [19, 75, 175, 3, 30.89], [20, 75, 175, 3, 30.09], [21, 75, 175, 3, 30.49], [22, 75, 175, 7, 36.11], [23, 75, 175, 7, 36.49], [24, 75, 175, 7, 36.87], [25, 30, 150, 5, 18.41], [26, 30, 150, 5, 17.37], [27, 30, 150, 5, 17.89], [28, 90, 150, 5, 32.55], [29, 90, 150, 5, 32.89], [30, 90, 150, 5, 33.23], [31, 60, 100, 5, 20.31], [32, 60, 100, 5, 19.47], [33, 60, 100, 5, 19.89], [34, 60, 200, 5, 28.47], [35, 60, 200, 5, 29.31], [36, 60, 200, 5, 28.89], [37, 60, 150, 1, 19.29], [38, 60, 150, 1, 18.83], [39, 60, 150, 1, 19.75], [40, 60, 150, 9, 28.89], [41, 60, 150, 9, 29.29], [42, 60, 150, 9, 29.69], [43, 60, 150, 5, 37.67], [44, 60, 150, 5, 38.51], [45, 60, 150, 5, 38.09]]:
Next the data are displayed as a matrix, and then converted into a vector of strengths and an array of conditions (sets of variables).
>
with(linalg):
data := convert(test_data,matrix);
Warning, the protected names norm and trace have been redefined and unprotected
The first column in the data above gives the test number, the second the curing time, the third the curing temperature, the fourth the percentage of accelerator added to the adhesive and the fifth shows strength of the joint (force to break it). Test numbers 1 - 3 refer to one set of conditions (time, temperature and accelerator), test numbers 4 - 6 to refer to another set, 7 - 9 to a third set, and so on.
The conditions and strengths are separated from the data array thus,
>
conds := delcols(delcols(data,5..5),1..1):
strengths := col(data,5):
The function relating strength and the three conditions,
, is written as:
>
str := c[1] + c[2]*x[1] + c[3]*x[2] + c[4]*x[3] + c[5]*x[1]^2 + c[6]*x[2]^2 + c[7]*x[3]^2 + c[8]*x[1]*x[2] + c[9]*x[1]*x[3] + c[10]*x[2]*x[3];
A literature search suggested very approximate initial values for the 10 constants as follows. Unity weighting is assumed for the observed strengths.
>
ini := vector(10,[-100,.1,.1,1,.001,.1,100,.001,.1,.1]):
wt:= vector(45,1):
Curve Fit
The test data are inserted in procedure
mnlfit
to obtain estimates of the constants in the strength equation. Choosing
tol
as
,
R
to obtain the table of residuals, and the number of digits to be shown as 5,
mnlfit
is run.
>
y := mnlfit(str,x,conds,strengths,wt,c,ini,10^(-7),R,5);
Mostly, the residuals are fairly small, the largest being 7.8 kN, or 2.16 times the
Std Deviation of Residuals
. The residuals are now plotted to check whether any trends are apparent.
>
Y := unapply(y,[x[1],x[2],x[3]]):
cond := convert(conds,listlist):
resids :=[seq(wt[i]*(strengths[i]-Y(cond[i,1],cond[i,2],cond[i,3]))/3.6268,i=1..45)]:
Noting that the data comprises sets of 3 strengths for each set of conditions, there is a set of 3 (weighted) residuals for each set of conditions. These are plotted accordingly.
>
cond_set := [seq(iquo(i,3),i=3..47)]:
pts := zip((x,y)->[x,y],[seq(cond_set[i],i=1..45)],[seq(resids[i],i=1..45)]):
plots[pointplot](pts,labels=["Set of Conditions no.","Weighted(Observed Strength - Calculated)/sd of residuals"]);
The plot shows no clear trend. As already seen from the
Table of Residuals
, all the weighted residuals fall within about 2 std deviations, and so are considered satisfactory. In practice, though, it would be worth checking the data obtained for set number 4, where the residuals are around 2 standard deviations.
Maximum Strength
Up to this point, the precision used by Maple has been 40 digits, but this is unnecessary for the answers in the remainder of this worksheet and makes them difficult to read; 8 digits are adequate.
The Hessian of
y
is calculated to confirm that it is negative-definite, and thus the stationary value is a local maximum. The values of the variables
that produce the stationary value of
y
are obtained by equating the gradient of
y
to zero, and solving the three equations. Substituting the values obtained into the expression for
y
gives the maximum strength.
>
Digits := 8:
definite(hessian(y,[x[1],x[2],x[3]]),'negative_def');
e := grad(y,[x[1],x[2],x[3]]):
sols := solve(convert(e,set),{x[1],x[2],x[3]});
`max strength` := subs(sols,y);
The values of
that produce the maximum strength are seen to be 51 minutes cure time at 180 deg C, with 7.4% accelerator added to the adhesive.
In practice, the strength obtained with this choice of conditions will vary about the maximum calculated above, in much the same way as the residuals vary about zero (see the plot above). Provided that the distribution of strengths calculated from the polynomial does not depart unduly from the Normal, the bond strength will be within about twice the
Std Deviations of Residuals
from that calculated above, that is within about 2 x 3.5747 kN of 43.9 kN. Thus, in about 95% of joints, the strength will lie between roughly 36.7 and 51 kN.
Yield Plots
Since the strength is a function of three variables, it can be plotted only by holding one of them constant, while the other two are varied. The three plots are shown below.
>
plot3d(Y(51,z[2],z[3]),z[2]=100..200,z[3]=1..9,orientation=[58,58],axes=FRAME,
title="Time = 51 minutes",labels=["Temperature","Accelerator","Strength"]);
>
plot3d(Y(z[1],180,z[3]),z[1]=30..90,z[3]=1..9,orientation=[60,58],axes=FRAME,
title="Temperature = 180 deg C",labels=["Time","Accelerator","Strength"]);
>
plot3d(Y(z[1],z[2],7.37),z[1]=30..90,z[2]=100..200,orientation=[57,60],axes=FRAME,
title="Accelerator = 7.37%",labels=["Time","Temperature","Strength"]);
>
>
Unsurprisingly, the plots show that the strength depends strongly on curing temperature and accelerator added, but less so on time.
Observations
a. Procedure
mnlfit
easily found the ten coefficients for the strength polynomial. That it performs so well with both linear and non-linear curve fitting, suggests that it is the only curve-fitting tool that many users will need.
b. The maximum strength of bond between the parts concerned was found to be 43.9 kN, with a curing time of 51 minutes at 180 deg C, and 7.4% of accelerator added to the adhesive. About 95% of bonds made under these conditions would have a strength between roughly 36.7 and 51 kN. (In practice, the actual limits would be found from samples made during normal production.)
c. With 10 variables in the strength polynomial, the statistics output (
Prin
set to
S
) from
mnlfit
is somewhat overwhelming. For that reason it is not shown here, but readers who run
mnlfit
with the
S
setting will note that
is 85.9%, indicating that the regression accounts for much of the covariation, and that the value of
F
suggests the regression is a reasonable fit. A
test (not shown here) of the standardised residuals suggests that they do not depart unduly from Normal, and thus some credence can be given to the test of
F
.
Disclaimer
: While every effort has been made to validate the solutions in this worksheet, Waterloo Maple Inc. and the contributors are not responsible for any errors contained and are not liable for any damages resulting from the use of this material.